| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | Matheus Alcantara <matheusssilv97(at)gmail(dot)com> |
| Cc: | Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Joel Jacobson <joel(at)compiler(dot)org>, Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>, Rishu Bagga <rishu(dot)postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Daniil Davydov <3danissimo(at)gmail(dot)com>, Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Subject: | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
| Date: | 2025-10-23 23:42:45 |
| Message-ID: | CAD21AoBKTF9SKSANX3xeXV9NjyDGYJeYEFN4duf--Tk=_exTYQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Oct 22, 2025 at 10:25 AM Matheus Alcantara
<matheusssilv97(at)gmail(dot)com> wrote:
>
> On Wed Oct 22, 2025 at 12:49 PM -03, Álvaro Herrera wrote:
> > On 2025-Oct-22, Matheus Alcantara wrote:
> >
> >> I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
> >> this architecture being used. I think that it's not, but perhaps is a
> >> good test to keep it?
> >
> > I'd rather have tests than not, but I'd think it needs to be behind
> > PG_TEST_EXTRA because of things like
> >
> > +$node->safe_psql('postgres', 'select consume_xids(10000000);');
> >
> Attached v10 with wrapping into PG_TEST_EXTRA.
Thank you for updating the patches!
I've reviewed the patches and here are the comments.
v10-0001-Prevent-VACUUM-from-truncating-XIDs-still-presen.patch:
+ *
+ * Returns InvalidTransactionId if there are no unprocessed notifications.
This comment is not accurate since the function returns
InvalidTransactionId even if all unprocessed notifications are created
on other databases.
---
+TransactionId
+GetOldestQueuedNotifyXid(void)
+{
How about renaming it to something like GetOldestNotifyTransactionId()?
---
+ /* page_buffer must be adequately aligned, so use a union */
+ union
+ {
+ char buf[QUEUE_PAGESIZE];
+ AsyncQueueEntry align;
+ } page_buffer;
asyncQueueReadAllNotifications() uses this union too, so how about
define this type as AlignedQueueEntryPage or something and use it in
both functions?
---
+
+ LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
+
I don't think we need an exclusive lock here.
---
In GetOldestQueuedNotifyXid() why do we keep holding NotifyQueueLock
while calculating the oldest XID in the queue?
---
+
+ /* Process all entries on this page up to head */
+ while (curoffset + QUEUEALIGN(AsyncQueueEntryEmptySize) <=
QUEUE_PAGESIZE &&
+ !QUEUE_POS_EQUAL(pos, head))
+ {
+ qe = (AsyncQueueEntry *) (page_buffer.buf + curoffset);
(snip)
+ /* Advance to next entry */
+ if (asyncQueueAdvance(&pos, qe->length))
+ break; /* advanced to next page */
+
I think the check "curoffset + QUEUEALIGN(AsyncQueueEntryEmptySize) <=
QUEUE_PAGESIZE" is essentially the same as the one we do in
asyncQueueAdvance(). I think we can refactor the inner loop in
GetOldestQueuedNotifyXid() to something like:
for (;;)
{
bool reachedEndOfPage;
qe = (AsyncQueueEntry *) (page_buffer.buf + curoffset);
// check qe->xid here...
reachedEndOfPage = asyncQueueAdvance(&pos, qe->length);
if (reachedEndOfPage || QUEUE_POS_EQUAL(pos, head))
break;
}
---
v10-0002-Add-tap-tests-for-listen-notify-vacuum-freeze.patch:
This new test directory needs to have .gitignore.
---
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+
It's better to have a short description of this test here.
---
+use File::Path qw(mkpath);
It seems not necessary.
---
+$node->safe_psql('postgres',
+ 'CREATE TABLE t AS SELECT g AS a, g+2 AS b from
generate_series(1,100000) g;'
Why does it need to insert many rows?
---
+# --- Session 2, multiple notify's, and commit ---
+for my $i (1 .. 10)
+{
+ $node->safe_psql(
+ 'postgres', "
+ BEGIN;
+ NOTIFY s, '$i';
+ COMMIT;");
+}
Why does it need to send a notification 10 times?
---
+# Consume enough XIDs to trigger truncation
+$node->safe_psql('postgres', 'select consume_xids(10000000);');
+
I guess it consumes XID too much to trigger truncation. Given the one
clog segment file is 256kB in size, it's enough to consume 1,050,000
XIDs to move to the next segment file.
---
+# Get the new datfrozenxid after vacuum freeze to ensure that is advanced but
+# we can still get the notification status of the notification
+my $datafronzenxid_freeze = $node->safe_psql('postgres', "select
datfrozenxid from pg_database where datname = 'postgres'");
+ok($datafronzenxid_freeze > $datafronzenxid, 'datfrozenxid is advanced');
I think this test passes in all cases, so it is meaningless. Instead,
what we need to check in terms of datfrozenxid is that its value
doesn't get greater than the XID we used for the notification, even
after vacuum freeze. I think we remember the XID used for NOTIFY and
check if the old/new datfrozenxid values are older than that value.
---
+is($notifications_count, 10, 'received all committed notifications');
+
+done_testing();
After consuming the unconsumed notifications, let's do vacuum freeze
and check if datfrozenxid now can be advanced.
---
I would expect to add 002_aborted_tx_notifies.pl in a separate patch
since it's not related to this bug fix.
---
+# Test checks that listeners do not receive notifications from aborted
+# transaction even if notifications have been added to the listen/notify
+# queue. To reproduce it we use the fact that serializable conflicts
+# are checked after tx adds notifications to the queue.
I wonder if we could implement this test using the isolation test
instead of the tap test. Is there any reason why you used a tap test
for that?
Regards,
> Should we enable this
> somewhere to be executed on build farm?
Yeah, I hope some buildfarm animals enable it.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ryohei Takahashi (Fujitsu) | 2025-10-24 00:00:42 | RE: Can we use Statistics Import and Export feature to perforamance testing? |
| Previous Message | Michael Paquier | 2025-10-23 23:33:17 | Re: [Proposal] Adding callback support for custom statistics kinds |