From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Matheus Alcantara <matheusssilv97(at)gmail(dot)com> |
Cc: | Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>, Rishu Bagga <rishu(dot)postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Daniil Davydov <3danissimo(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joel Jacobson <joel(at)compiler(dot)org> |
Subject: | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
Date: | 2025-09-15 17:40:40 |
Message-ID: | CAD21AoBAcr6nAx5C4soYO3iR9dda_k8hFLok1Psd0R070qs9hg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Sep 6, 2025 at 7:15 AM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
>
> On Fri Sep 5, 2025 at 9:56 AM -03, Arseniy Mukhin wrote:
> >> Ok, I agree that that this may happen but I don't see this as a common
> >> case to fix the issue based on this behaviour. I think that we check the
> >> transaction status also to skip notifications that were added on the
> >> queue by transactions that are not fully committed yet, and I see this
> >> scenario as a more common case but I could be wrong.
> >>
> >
> > IIUC we don't need clog to check if the notification transaction is
> > still in progress, we use a snapshot for that (XidInMVCCSnapshot()).
> > If we see that the notification transaction is still in progress, we
> > postpone processing of that notification. If we see that notification
> > transaction is not in progress, then we check its status with
> > TransactionIdDidCommit() (using clog) and only then fail if clog
> > doesn't have notification transcation data (as a quick check, we can
> > see in the stack trace that the failure occurs during the call to
> > TransactionIdDidCommit, not XidInMVCCSnapshot).
> >
> > So we need to check notification transaction status only to understand
> > if a transaction was committed and we can send notification or it was
> > aborted (somewhere in between PreCommit_Notify and AtCommit_Notify)
> > and we should skip notification. And the solution that Yura Sokolov
> > suggested is to introduce a 'commited' flag, so we can check it
> > instead of calling TransactionIdDidCommit. Another option is to make
> > aborted transactions to clean up after themselves in AtAbort_Notify(),
> > so if listener see notification in the queue and transaction that
> > wrote it is not in progress, then such notification was always created
> > by committed transaction and we also don't need to call
> > TransactionIdDidCommit. One way for aborted transactions to clean up
> > is to set its notifications dbOid to 0, so listeners skip them.
> > Another option I think is to set queue HEAD back to the position where
> > it was when the aborted transaction started to write to the queue. It
> > also makes such aborted transaction notifications 'invisible' for
> > listeners. We have a global lock and there is always only one writer,
> > so it sounds possible. And there is another option which is the patch
> > that Rishu Bagga is working on. In the patch all transactions write to
> > the listen/notify queue only after they wrote a commit record to the
> > WAL, so all notifications in the queue are always by committed
> > transactions and we don't need to call TransactionIdDidCommit.
> >
> You are right, I missed the XidInMVCCSnapshot() call, sorry about that
> and thanks very much for all the explanation!
>
> I'll keep on hold the development of new patch versions for this thread
> and focus on review and test the patch from Rishu at [1] to see if we
> can make progress using the WAL approach.
While the WAL-based approach discussed on another thread is promising,
I think it would not be acceptable for back branches as it requires
quite a lot of refactoring. Given that this is a long-standing bug in
listen/notify, I think we can continue discussing how to fix the issue
on backbranches on this thread.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Sophie Alpert | 2025-09-15 17:41:27 | Re: Fix missing EvalPlanQual recheck for TID scans |
Previous Message | Marcos Pegoraro | 2025-09-15 17:26:22 | Changing on SGML Notes |