Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To: "Arseniy Mukhin" <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>
Cc: "Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com>, "Rishu Bagga" <rishu(dot)postgres(at)gmail(dot)com>, "Yura Sokolov" <y(dot)sokolov(at)postgrespro(dot)ru>, "Daniil Davydov" <3danissimo(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, "Alexandra Wang" <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joel Jacobson" <joel(at)compiler(dot)org>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-09-23 22:39:57
Message-ID: DD0JHP2L085C.19I49XELORB20@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue Sep 23, 2025 at 1:11 PM -03, Arseniy Mukhin wrote:
>> Thanks for the explanation! I'm just not sure if I understand why do we
>> need the LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) on
>> PreCommit_Notify() if we already have the
>> LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
>>
>
> Good question. It seems that it would be enough to hold
> NotifyQueueLock only during the head update, so I don't understand it
> either.
>
IIUC correctly we acquire the LockSharedObject(DatabaseRelationId,
InvalidOid, 0, AccessExclusiveLock) to make other COMMIT's wait and the
LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) to make listener backends
wait while we are adding the entries on the queue.

> Thank you for the new version!
>
> The fix looks exactly how I thought about it. But while thinking about
> why we need to hold NotifyQueueLock in PreCommit_Notify I realized
> there is a concurrency bug in the 'head resetting' approach. I thought
> that listeners hold NotifyQueueLock lock while reading notifications
> from the queue, but now I see that they hold it only while reading the
> current head position. So we can end up in the next situation:
>
> There are 2 backends: listener and writer (backend with notifications)
>
>
> listener writer
> ----------------------------------------------------------------------
> writer wants to commit, so it
> adds notifications to the queue
> in PreCommit_Notify() and advances
> queue head.
>
> ----------------------------------------------------------------------
> listener wants to read notifications. It
> gets into asyncQueueReadAllNotifications()
> and reads the current head (it already
> contains writer's notifications):
>
>
> asyncQueueReadAllNotifications()
> ...
> LWLockAcquire(NotifyQueueLock, LW_SHARED);
> head = QUEUE_HEAD;
> LWLockRelease(NotifyQueueLock);
> ...
>
> ----------------------------------------------------------------------
> writer failed to commit, so it
> resets queue head in AtAbort_Notify()
> and completes the transaction.
>
> ----------------------------------------------------------------------
> listener gets a snapshot where the writer
> is not in progress.
>
> ...
> snapshot = RegisterSnapshot(GetLatestSnapshot());
> ...
>
>
> This way the listener reads the head that includes all writer's
> notifications and a snapshot where the writer is not in progress, so
> nothing stops the listener from sending these notifications and it's
> even possible to have the listener's position that is after the queue
> head, so yes, it's bad :( Sorry about that.
>
Yeah, this is bad. I'm wondering if we could reproduce such race
conditions scenarios with some TAP tests.

> Probably we can fix it (swap GetLatestSnapshot() with the head
> position reading for example) but now I think that 'committed' field
> approach is a more robust way to fix the bug. What do you think?
>
I also agree that the committed field seems a more safe approach.

> BTW one thing about 'committed' field fix version [0]. It seems that
> instead of remembering all notification positions, we can remember the
> head position after we acquire global lock and before we write
> anything to the queue and in case of abort we can just start from the
> saved position and mark all notifications until the end of the queue
> as 'committed = false'. The reason is the same as with resetting the
> head approach - as we hold global lock, no writers can add
> notifications to the queue, so when we in AtAbort_Notify() all
> notifications after the saved position are ours.
>
See the new attached version which implements this idea of using the
committed field approach. I was just a bit concenerd about a race
condition situation where the QUEUE_HEAD is changed by another publisher
process and just iterating over all entries from the saved previous
QUEUE_HEAD position until the current QUEUE_HEAD position we could mark
successfully committed notifications as not committed by mistake, so in
this new patch version I save the QUEUE_HEAD position before we add the
entries on the shared queue and also the QUEUE_HEAD position after these
entries are added so we ensure that we only process the entries of this
range although we have the global lock
LockSharedObject(DatabaseRelationId) that may prevent this situation.

What do you think?

--
Matheus Alcantara

Attachment Content-Type Size
v3-0001-Make-AsyncQueueEntry-s-self-contained.patch text/plain 19.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-09-23 22:55:12 Re: Add support for entry counting in pgstats
Previous Message Jacob Champion 2025-09-23 21:57:34 Re: libcurl in libpq.pc