Re: Synchronizing slots from primary to standby

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-10-27 15:13:43
Message-ID: afe4ab6c-dde3-48ea-acd8-6f6052c7b8fd@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 10/27/23 11:56 AM, shveta malik wrote:
> On Wed, Oct 25, 2023 at 3:15 PM Drouvot, Bertrand
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>>
>> Hi,
>>
>> On 10/25/23 5:00 AM, shveta malik wrote:
>>> On Tue, Oct 24, 2023 at 11:54 AM Drouvot, Bertrand
>>> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 10/23/23 2:56 PM, shveta malik wrote:
>>>>> On Mon, Oct 23, 2023 at 5:52 PM Drouvot, Bertrand
>>>>> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>>>>
>>>>>> We are waiting for DEFAULT_NAPTIME_PER_CYCLE (3 minutes) before checking if there
>>>>>> is new synced slot(s) to be created on the standby. Do we want to keep this behavior
>>>>>> for V1?
>>>>>>
>>>>>
>>>>> I think for the slotsync workers case, we should reduce the naptime in
>>>>> the launcher to say 30sec and retain the default one of 3mins for
>>>>> subscription apply workers. Thoughts?
>>>>>
>>>>
>>>> Another option could be to keep DEFAULT_NAPTIME_PER_CYCLE and create a new
>>>> API on the standby that would refresh the list of sync slot at wish, thoughts?
>>>>
>>>
>>> Do you mean API to refresh list of DBIDs rather than sync-slots?
>>> As per current design, launcher gets DBID lists for all the failover
>>> slots from the primary at intervals of DEFAULT_NAPTIME_PER_CYCLE.
>>
>> I mean an API to get a newly created slot on the primary being created/synced on
>> the standby at wish.
>>
>> Also let's imagine this scenario:
>>
>> - create logical_slot1 on the primary (and don't start using it)
>>
>> Then on the standby we'll get things like:
>>
>> 2023-10-25 08:33:36.897 UTC [740298] LOG: waiting for remote slot "logical_slot1" LSN (0/C00316A0) and catalog xmin (752) to pass local slot LSN (0/C0049530) and and catalog xmin (754)
>>
>> That's expected and due to the fact that ReplicationSlotReserveWal() does set the slot
>> restart_lsn to a value < at the corresponding restart_lsn slot on the primary.
>>
>> - create logical_slot2 on the primary (and start using it)
>>
>> Then logical_slot2 won't be created/synced on the standby until there is activity on logical_slot1 on the primary
>> that would produce things like:
>> 2023-10-25 08:41:35.508 UTC [740298] LOG: wait over for remote slot "logical_slot1" as its LSN (0/C005FFD8) and catalog xmin (756) has now passed local slot LSN (0/C0049530) and catalog xmin (754)
>
>
> Slight correction to above. As soon as we start activity on
> logical_slot2, it will impact all the slots on primary, as the WALs
> are consumed by all the slots. So even if there is activity on
> logical_slot2, logical_slot1 creation on standby will be unblocked and
> it will then move to logical_slot2 creation. eg:
>
> --on standby:
> 2023-10-27 15:15:46.069 IST [696884] LOG: waiting for remote slot
> "mysubnew1_1" LSN (0/3C97970) and catalog xmin (756) to pass local
> slot LSN (0/3C979A8) and and catalog xmin (756)
>
> on primary:
> newdb1=# select now();
> now
> ----------------------------------
> 2023-10-27 15:15:51.504835+05:30
> (1 row)
>
> --activity on mysubnew1_3
> newdb1=# insert into tab1_3 values(1);
> INSERT 0 1
> newdb1=# select now();
> now
> ----------------------------------
> 2023-10-27 15:15:54.651406+05:30
>
>
> --on standby, mysubnew1_1 is unblocked.
> 2023-10-27 15:15:56.223 IST [696884] LOG: wait over for remote slot
> "mysubnew1_1" as its LSN (0/3C97A18) and catalog xmin (757) has now
> passed local slot LSN (0/3C979A8) and catalog xmin (756)
>
> My Setup:
> mysubnew1_1 -->mypubnew1_1 -->tab1_1
> mysubnew1_3 -->mypubnew1_3-->tab1_3
>

Agree with your test case, but in my case I was not using pub/sub.

I was not clear, so when I said:

>> - create logical_slot1 on the primary (and don't start using it)

I meant don't start decoding from it (like using pg_recvlogical() or
pg_logical_slot_get_changes()).

By using pub/sub the "don't start using it" is not satisfied.

My test case is:

"
SELECT * FROM pg_create_logical_replication_slot('logical_slot1', 'test_decoding', false, true, true);
SELECT * FROM pg_create_logical_replication_slot('logical_slot2', 'test_decoding', false, true, true);
pg_recvlogical -d postgres -S logical_slot2 --no-loop --start -f -
"

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-10-27 15:17:03 Re: Enderbury Island disappeared from timezone database
Previous Message Dmitry Dolgov 2023-10-27 15:02:44 Re: pg_stat_statements and "IN" conditions