From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com> |
Subject: | Random subscription 021_twophase test failure on kestrel |
Date: | 2025-05-23 15:25:27 |
Message-ID: | CALDaNm329QaZ+bwU--bW6GjbNSZ8-38cDE8QWofafub7NV67oA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
The 021_twophase test has failed on Kestrel at [1] with the following error:
# Failed test 'should be no prepared transactions on subscriber'
# at /home/bf/bf-build/kestrel/HEAD/pgsql/src/test/subscription/t/021_twophase.pl
line 438.
# got: '1'
# expected: '0'
# Looks like you failed 1 test of 30.
This failure is caused by a prepared transaction that was not properly
committed due to replication lag on one of the subscriptions. The test
involves two subscriptions: tap_sub and tap_sub_copy. After committing
the prepared transaction 'mygid', the test only waits for tap_sub_copy
to catch up:
node_publisher->wait_for_catchup($appname_copy);
However, tap_sub is dropped before ensuring it has replayed the commit
of 'mygid' prepared transaction, leading to a leftover prepared
transaction on the subscriber:
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub");
When the test later checks for the number of prepared transactions, it
fails because tap_sub had not finished applying the commit:
# at line 438
# got: '1'
# expected: '0'
This issue can be consistently reproduced by injecting a delay (e.g.,
3 seconds) in tap_sub's walsender while decoding the commit of
'mygid'. A patch to demonstrate this behavior is provided at
021_two_phase_test_failure_reproduce.patch. The test can be fixed by
explicitly waiting for both subscriptions to catch up before dropping
either. A patch implementing this fix is attached.
Thanks Amit for the offline discussion and sharing your thoughts on the same.
[1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2025-05-22%2021%3A19%3A22
Regards,
Vignesh
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-random-021_twophase-test-failure.patch | text/x-patch | 1.5 KB |
test_failure_fix.patch | text/x-patch | 865 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-05-23 15:42:49 | Re: Why our Valgrind reports suck |
Previous Message | Fujii Masao | 2025-05-23 14:56:59 | Re: Addition of %b/backend_type in log_line_prefix of TAP test logs |