Re: Minimal logical decoding on standbys

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, fabriziomello(at)gmail(dot)com, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2023-04-11 08:20:11
Message-ID: 7eb26a22-a6b2-f938-6d3c-6adcf4785004@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 4/11/23 7:36 AM, Noah Misch wrote:
> On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote:
>> --- /dev/null
>> +++ b/src/test/recovery/t/035_standby_logical_decoding.pl
>> @@ -0,0 +1,720 @@
>> +# logical decoding on standby : test logical decoding,
>> +# recovery conflict and standby promotion.
> ...
>> +$node_primary->append_conf('postgresql.conf', q{
>> +wal_level = 'logical'
>> +max_replication_slots = 4
>> +max_wal_senders = 4
>> +log_min_messages = 'debug2'
>> +log_error_verbosity = verbose
>> +});
>
> Buildfarm member hoverfly stopped reporting in when this test joined the tree.
> It's currently been stuck here for 140 minutes:
>

Thanks for the report!

It's looping on:

2023-04-11 02:57:52.516 UTC [62718288:5] 035_standby_logical_decoding.pl LOG: 00000: statement: SELECT restart_lsn IS NOT NULL
FROM pg_catalog.pg_replication_slots WHERE slot_name = 'promotion_inactiveslot'

And the reason is that the slot is not being created:

$ grep "CREATE_REPLICATION_SLOT" 035_standby_logical_decoding_standby.log | tail -2
2023-04-11 02:57:47.287 UTC [9241178:15] 035_standby_logical_decoding.pl STATEMENT: CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 'nothing')
2023-04-11 02:57:47.622 UTC [9241178:23] 035_standby_logical_decoding.pl STATEMENT: CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 'nothing')

Not sure why the slot is not being created.

There is also "replication apply delay" increasing:

2023-04-11 02:57:49.183 UTC [13304488:253] DEBUG: 00000: sendtime 2023-04-11 02:57:49.111363+00 receipttime 2023-04-11 02:57:49.183512+00 replication apply delay 644 ms transfer latency 73 ms
2023-04-11 02:57:49.184 UTC [13304488:259] DEBUG: 00000: sendtime 2023-04-11 02:57:49.183461+00 receipttime 2023-04-11 02:57:49.1842+00 replication apply delay 645 ms transfer latency 1 ms
2023-04-11 02:57:49.221 UTC [13304488:265] DEBUG: 00000: sendtime 2023-04-11 02:57:49.184166+00 receipttime 2023-04-11 02:57:49.221059+00 replication apply delay 682 ms transfer latency 37 ms
2023-04-11 02:57:49.222 UTC [13304488:271] DEBUG: 00000: sendtime 2023-04-11 02:57:49.221003+00 receipttime 2023-04-11 02:57:49.222144+00 replication apply delay 683 ms transfer latency 2 ms
2023-04-11 02:57:49.222 UTC [13304488:277] DEBUG: 00000: sendtime 2023-04-11 02:57:49.222095+00 receipttime 2023-04-11 02:57:49.2228+00 replication apply delay 684 ms transfer latency 1 ms

Noah, I think hoverfly is yours, would it be possible to have access (I'm not an AIX expert though) or check if you see a slot creation hanging and if so why?

> ===
> $ tail -n5 regress_log_035_standby_logical_decoding
> [02:57:48.390](0.100s) ok 66 - otherslot on standby not dropped
>
> ### Reloading node "standby"
> # Running: pg_ctl -D /scratch/nm/farm/xlc64v16/HEAD/pgsql.build/src/test/recovery/tmp_check/t_035_standby_logical_decoding_standby_data/pgdata reload
> server signaled
> ===
>
> I've posted a tarball of the current logs at
> https://drive.google.com/file/d/1JIZ5hSHBsKjEgU5WOGHOqXB7Z_-9XT5u/view?usp=sharing.
> The test times out (PG_TEST_TIMEOUT_DEFAULT=5400), and uploading logs then
> fails with 413 Request Entity Too Large. Is the above
> log_min_messages='debug2' important? Removing that may make the logs small
> enough to upload normally.

I think debug2 might still be useful while investigating this issue (I'll compare a working TAP run with this one).

Regards

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-04-11 08:20:43 Re: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Dave Page 2023-04-11 08:05:31 Re: When to drop src/tools/msvc support