Re: Synchronizing slots from primary to standby

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-11-10 08:15:39
Message-ID: dd9dbbaf-ca77-423a-8d62-bfc814626b47@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 11/10/23 8:55 AM, Amit Kapila wrote:
> On Fri, Nov 10, 2023 at 12:50 PM Drouvot, Bertrand
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>>
>> But even if we ERROR out instead of emitting a WARNING, the user would still
>> need to be notified/monitor such errors. I agree that then probably they will
>> come to know earlier because the slot sync mechanism would be stopped but still
>> it is not "guaranteed" (specially if there is no others "working" synced slots
>> around.)
>
>>
>> And if they do not, then there is still a risk to use this slot after a
>> failover thinking this is a "synced" slot.
>>
>
> I think this is another reason that probably giving ERROR has better
> chances for the user to notice before failover. IF knowing such errors
> user still proceeds with the failover, the onus is on her.

Agree. My concern is more when they don't know about the error.

> We can
> probably document this hazard along with the failover feature so that
> users are aware that they either need to be careful while creating
> slots on standby or consult ERROR logs. I guess we can even make it
> visible in the view also.

Yeah.

>> Giving more thoughts, what about using a dedicated/reserved naming convention for
>> synced slot like synced_<primary_slot_name> or such and then:
>>
>> - prevent user to create sync_<whatever> slots on standby
>> - sync <slot> on primary to sync_<slot> on standby
>> - during failover, rename sync_<slot> to <slot> and if <slot> exists then
>> emit a WARNING and keep sync_<slot> in place.
>>
>> That way both slots are still in place (the manually created <slot> and
>> the sync_<slot<) and one could decide what to do with them.
>>
>
> Hmm, I think after failover, users need to rename all slots or we need
> to provide a way to rename them so that they can be used by
> subscribers which sounds like much more work.

Agree that's much more work for the subscriber case. Maybe that's not worth
the extra work.

>>> Also, the current coding doesn't ensure
>>> we will always give WARNING. If we see the below code that deals with
>>> this WARNING,
>>>
>>> +  /* User created slot with the same name exists, emit WARNING. */
>>> +  else if (found && s->data.sync_state == SYNCSLOT_STATE_NONE)
>>> +  {
>>> +    ereport(WARNING,
>>> +        errmsg("not synchronizing slot %s; it is a user created slot",
>>> +           remote_slot->name));
>>> +  }
>>> +  /* Otherwise create the slot first. */
>>> +  else
>>> +  {
>>> +    TransactionId xmin_horizon = InvalidTransactionId;
>>> +    ReplicationSlot *slot;
>>> +
>>> +    ReplicationSlotCreate(remote_slot->name, true, RS_EPHEMERAL,
>>> +               remote_slot->two_phase, false);
>>>
>>> I think this is not a solid check to ensure that the slot existed
>>> before. Because it could be created as soon as the slot sync worker
>>> invokes ReplicationSlotCreate() here.
>>
>> Agree.
>>
>
> So, having a concrete check to give WARNING would require some more
> logic which I don't think is a good idea to handle this boundary case.
>

Yeah good point, agree to just error out in all the case then (if we discard
the sync_ reserved wording proposal, which seems to be the case as probably
not worth the extra work).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2023-11-10 08:29:34 Re: Failure during Building Postgres in Windows with Meson
Previous Message Hayato Kuroda (Fujitsu) 2023-11-10 07:59:49 RE: MinGW compiler warnings in ecpg tests