Re: A proposal to force-drop replication slots to make disabling async/sync standbys or logical replication faster in production environments

From: "Hsu, John" <hsuchen(at)amazon(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: A proposal to force-drop replication slots to make disabling async/sync standbys or logical replication faster in production environments
Date: 2022-06-10 15:12:16
Message-ID: eeb07b30-d1ce-577a-29b9-301aa9530d0e@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Why couldn't you terminate the active_pid associated with the slot you
want to drop if it's active prior to dropping?

On 6/10/22 3:03 AM, Bharath Rupireddy wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Thu, Jun 9, 2022 at 12:11 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Thu, Jun 9, 2022 at 11:07 AM Bharath Rupireddy
>> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>>> Currently postgres doesn't allow dropping a replication slot that's active [1]. This can make certain operations more time-consuming or stuck in production environments. These operations are - disable async/sync standbys and disable logical replication that require the postgres running on standby or the subscriber to go down. If stopping postgres server takes time, the VM or container will have to be killed forcefully which can take a considerable amount of time as there are many layers in between.
>>>
>> Why do you want to drop the slot when the server is going down? Is it
>> some temporary replication slot, otherwise, how will you resume
>> replication after restarting the server?
> The setup is this - primary, bunch of sync standbys, bunch of read
> replicas (async standbys), bunch of logical replication subscribers -
> now, the user wants to remove any of them for whatever reasons,
> typical flow is to first stop the server, if stopping the server takes
> time (for instance the standbys or subscribers lag behind the primary
> by too much), kill the VM/host server to make the corresponding
> replication slots inactive on the primary and then drop the
> replication slots. The proposed force-drop function helps speed up
> these operations in production environments and it will also be
> possible to provide an SLA for these disable operations.
>
> I hope the user case is clear.
>
> Regards,
> Bharath Rupireddy.
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2022-06-10 16:08:22 Re: better page-level checksums
Previous Message Robert Haas 2022-06-10 14:23:04 Re: better page-level checksums