Re: Test 031_recovery_conflict.pl is not immune to autovacuum

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Test 031_recovery_conflict.pl is not immune to autovacuum
Date: 2024-03-21 10:00:00
Message-ID: 896a43eb-b757-4fd0-e70a-6eb65a974903@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

20.03.2024 22:24, Melanie Plageman wrote:
>> [13:27:39.034](2.317s) ok 14 - startup deadlock: lock acquisition is waiting
>> Waiting for replication conn standby's replay_lsn to pass 0/343E6D0 on primary
>> done
>> timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at t/031_recovery_conflict.pl
>> line 318.
>> # Postmaster PID for node "primary" is 1523036
>> ### Stopping node "primary" using mode immediate
>>
>> 031_recovery_conflict_standby.log really doesn't contain the expected
>> message. I can share log files from a successful and failed test runs, if
>> they can be helpful, or I'll investigate this case today/tomorrow.
> Hmm. The log file from the failed test run with
> (autovacuum_enabled=false) would be helpful. I can't tell without the
> log if it hit a different type of conflict. Unfortunately it was very
> difficult to trigger the specific type of recovery conflict we were
> trying to test and not hit another of the recovery conflict types
> first. It'll take me some time to swap this back in my head, though.

Please look at the attached logs. For a successful run, I see in 031_recovery_conflict_standby.log:
2024-03-20 13:28:08.084 UTC|||65fae466.1744d7|CONTEXT:  WAL redo at 0/347F9B8 for Heap2/PRUNE: snapshotConflictHorizon:
0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
100, 101]; blkref #0: rel 1663/16385/16396, blk 0
2024-03-20 13:28:08.084 UTC|vagrant|test_db|65fae467.174503|ERROR: canceling statement due to conflict with recovery at
character 15
2024-03-20 13:28:08.084 UTC|vagrant|test_db|65fae467.174503|DETAIL: User transaction caused buffer deadlock with recovery.
2024-03-20 13:28:08.084 UTC|||65fae466.1744d7|LOG:  recovery finished waiting after 10.432 ms: recovery conflict on
buffer pin
2024-03-20 13:28:08.084 UTC|||65fae466.1744d7|CONTEXT:  WAL redo at 0/347F9B8 for Heap2/PRUNE: snapshotConflictHorizon:
0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
100, 101]; blkref #0: rel 1663/16385/16396, blk 0
2024-03-20 13:28:08.084 UTC|||65fae466.1744d7|LOG:  REDO @ 0/347FAB8; LSN 0/347FB00: prev 0/347F9B8; xid 0; len 7;
blkref #0: rel 1663/16385/16396, blk 0 - Heap2/FREEZE_PAGE: snapshotConflictHorizon: 762, nplans: 1, isCatalogRel: F,
plans: [{ xmax: 0, infomask: 2817, infomask2: 2, ntuples: 1, offsets: [1] }]

And the corresponding fragment from 031_recovery_conflict_primary.log:
2024-03-20 13:28:07.846 UTC|vagrant|test_db|65fae467.174549|LOG: xlog flush request 0/347DCB0; write 0/0; flush 0/0
2024-03-20 13:28:07.846 UTC|vagrant|test_db|65fae467.174549|CONTEXT:  writing block 16 of relation base/16385/1249
2024-03-20 13:28:07.847 UTC|vagrant|test_db|65fae467.174549|LOG: statement: VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/347FAB8:  - Heap2/PRUNE:
snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused:
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,
94, 95, 96, 97, 98, 99, 100, 101]
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT:  while scanning block 0 of relation
"public.test_recovery_conflict_table1"
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/347FB00:  - Heap2/FREEZE_PAGE:
snapshotConflictHorizon: 762, nplans: 1, isCatalogRel: F, plans: [{ xmax: 0, infomask: 2817, infomask2: 2, ntuples: 1,
offsets: [1] }]
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT:  while scanning block 0 of relation
"public.test_recovery_conflict_table1"
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3481B50:  - XLOG/FPI_FOR_HINT:
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT:  while scanning block 0 of relation
"public.test_recovery_conflict_table1"
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3483BA0:  - XLOG/FPI_FOR_HINT:
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT:  while scanning relation
"public.test_recovery_conflict_table1"
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3485BF0:  - XLOG/FPI_FOR_HINT:
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT:  while scanning relation
"public.test_recovery_conflict_table1"
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3485CB0:  - Heap/INPLACE: off: 4
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3485D10:  - Standby/INVALIDATIONS: ; inval
msgs: catcache 55 catcache 54 relcache 16396
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;

Whilst 031_recovery_conflict_primary.log for the failed run contains:
2024-03-20 13:27:39.042 UTC|vagrant|test_db|65fae44b.17419a|LOG: statement: VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|LOG: INSERT @ 0/343E718:  - Heap2/FREEZE_PAGE:
snapshotConflictHorizon: 751, nplans: 1, isCatalogRel: F, plans: [{ xmax: 0, infomask: 2817, infomask2: 2, ntuples: 1,
offsets: [1] }]
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|CONTEXT:  while scanning block 0 of relation
"public.test_recovery_conflict_table1"
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|LOG: INSERT @ 0/343E7D8:  - Heap/INPLACE: off: 4
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|LOG: INSERT @ 0/343E838:  - Standby/INVALIDATIONS: ; inval
msgs: catcache 55 catcache 54 relcache 16392
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;

(there is no Heap2/PRUNE record)

I've modified the test as follows:
--- a/src/test/recovery/t/031_recovery_conflict.pl
+++ b/src/test/recovery/t/031_recovery_conflict.pl
@@ -59,7 +59,7 @@ my $table1 = "test_recovery_conflict_table1";
 my $table2 = "test_recovery_conflict_table2";
 $node_primary->safe_psql(
        $test_db, qq[
-CREATE TABLE ${table1}(a int, b int);
+CREATE TABLE ${table1}(a int, b int) WITH (autovacuum_enabled = false);
 INSERT INTO $table1 SELECT i % 3, 0 FROM generate_series(1,20) i;
 CREATE TABLE ${table2}(a int, b int);
 ]);

Best regards,
Alexander

Attachment Content-Type Size
031-failure.tar.bz2 application/x-bzip 132.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-03-21 10:31:17 Re: Refactoring backend fork+exec code
Previous Message Jelte Fennema-Nio 2024-03-21 09:58:40 Re: Flushing large data immediately in pqcomm