Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

From: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To: "Jacques Combrink" <jacques(at)quantsolutions(dot)co(dot)za>, "Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Daniil Davydov" <3danissimo(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, "Alexandra Wang" <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date: 2025-09-03 20:35:49
Message-ID: DCJGBRB9RUV4.39SNC2UAKVCG3@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon Sep 1, 2025 at 11:06 AM -03, Jacques Combrink wrote:
> TLDR:
> active listener on one database causes notify on another database to get
> stuck.
> At no point could I get a stuck notify if I don't have a listener on at
> least one other database than the one I am notifying on. See the Extra
> weirdness section.
> At no point do you need to have any other queries running, there is
> never an idle in transaction query needed for bad timing with the vacuum.
>
> I hope I explained everything well enough so that one of you smart
> people can find and fix the problem.
>
The long running transaction steps is just an example that we can lose
notifications using the first patch from Daniil that Alex has shared on
[1]. The steps that you've shared is just another way to trigger the
issue but it's similar to the steps that Alex also shared on [1].

All these different ways to trigger the error face the same underlying
problem: If a notification is keep for too long on the queue that vacuum
freeze can run and truncate clog files that contains transaction
information of this notification the error will happen.

The patch that I've attached on [2] aims to fix the issue following the
steps that you've shared, but during the tests I've found a stack
overflow bug on AsyncQueueIterNextNotification() due to the number of
notifications. I'm attaching a new version that fix this bug and I tried
to reproduce your steps with this new version and the issue seems to be
fixed.

Note that notifications that were added without any previous LISTEN will
block the xid advance during VACUUM FREEZE until we have a listener on
the database that owns these notifications. The XXX comment on vacuum.c
is about this problem.

[1] https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw%40mail.gmail.com
[2] https://www.postgresql.org/message-id CAFY6G8cJm73_MM9SuynZUqtqcaTuepUDgDuvS661oLW7U0dgsg%40mail.gmail.com

--
Matheus Alcantara

Attachment Content-Type Size
v2-0001-Consider-LISTEN-NOTIFY-min-xid-during-VACUUM-FREE.patch text/plain 16.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matheus Alcantara 2025-09-03 21:04:47 Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Previous Message Tom Lane 2025-09-03 20:26:26 Re: Use merge-based matching for MCVs in eqjoinsel