Re: Logical decoding on standby

From: Andres Freund <andres(at)anarazel(dot)de>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(dot)riggs(at)2ndquadrant(dot)com>, Thom Brown <thom(at)linux(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical decoding on standby
Date: 2017-03-28 15:22:38
Message-ID: 20170328152238.h3ikwwsl5kbqq6nk@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2017-03-27 16:03:48 +0800, Craig Ringer wrote:
> On 27 March 2017 at 14:08, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>
> > So this patch makes ReplicationSlotAcquire check that the slot
> > database matches the current database and refuse to acquire the slot
> > if it does not.
>
> New patch attached that drops above requirement, so slots can still be
> dropped from any DB.
>
> This introduces a narrow race window where DROP DATABASE may ERROR if
> somebody connects to a different database and runs a
> pg_drop_replication_slot(...) for one of the slots being dropped by
> DROP DATABASE after we check for active slots but before we've dropped
> the slot. But it's hard to hit and it's pretty harmless; the worst
> possible result is dropping one or more of the slots before we ERROR
> out of the DROP. But you clearly didn't want them anyway, since you
> were dropping the DB and dropping some slots at the same time.
>
> I think this one's ready to go.
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

> From 99d5313d3a265bcc57ca6845230b9ec49d188710 Mon Sep 17 00:00:00 2001
> From: Craig Ringer <craig(at)2ndquadrant(dot)com>
> Date: Wed, 22 Mar 2017 13:21:09 +0800
> Subject: [PATCH] Make DROP DATABASE drop logical slots for the DB
>
> Automatically drop all logical replication slots associated with a
> database when the database is dropped.
> ---
> doc/src/sgml/func.sgml | 3 +-
> doc/src/sgml/protocol.sgml | 2 +
> src/backend/commands/dbcommands.c | 32 +++++---
> src/backend/replication/slot.c | 88 ++++++++++++++++++++++
> src/include/replication/slot.h | 1 +
> src/test/recovery/t/006_logical_decoding.pl | 40 +++++++++-
> .../recovery/t/010_logical_decoding_timelines.pl | 30 +++++++-
> 7 files changed, 182 insertions(+), 14 deletions(-)
>
> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
> index ba6f8dd..78508d7 100644
> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -18876,7 +18876,8 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
> <entry>
> Drops the physical or logical replication slot
> named <parameter>slot_name</parameter>. Same as replication protocol
> - command <literal>DROP_REPLICATION_SLOT</>.
> + command <literal>DROP_REPLICATION_SLOT</>. For logical slots, this must
> + be called when connected to the same database the slot was created on.
> </entry>
> </row>
>
> diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
> index b3a5026..5f97141 100644
> --- a/doc/src/sgml/protocol.sgml
> +++ b/doc/src/sgml/protocol.sgml
> @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are:
> <para>
> Drops a replication slot, freeing any reserved server-side resources. If
> the slot is currently in use by an active connection, this command fails.
> + If the slot is a logical slot that was created in a database other than
> + the database the walsender is connected to, this command fails.
> </para>
> <variablelist>
> <varlistentry>

Shouldn't the docs in the drop database section about this?

> +void
> +ReplicationSlotsDropDBSlots(Oid dboid)
> +{
> + int i;
> +
> + if (max_replication_slots <= 0)
> + return;
> +
> +restart:
> + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> + for (i = 0; i < max_replication_slots; i++)
> + {
> + ReplicationSlot *s;
> + NameData slotname;
> + int active_pid;
> +
> + s = &ReplicationSlotCtl->replication_slots[i];
> +
> + /* cannot change while ReplicationSlotCtlLock is held */
> + if (!s->in_use)
> + continue;
> +
> + /* only logical slots are database specific, skip */
> + if (!SlotIsLogical(s))
> + continue;
> +
> + /* not our database, skip */
> + if (s->data.database != dboid)
> + continue;
> +
> + /* Claim the slot, as if ReplicationSlotAcquire()ing. */
> + SpinLockAcquire(&s->mutex);
> + strncpy(NameStr(slotname), NameStr(s->data.name), NAMEDATALEN);
> + NameStr(slotname)[NAMEDATALEN-1] = '\0';
> + active_pid = s->active_pid;
> + if (active_pid == 0)
> + {
> + MyReplicationSlot = s;
> + s->active_pid = MyProcPid;
> + }
> + SpinLockRelease(&s->mutex);
> +
> + /*
> + * We might fail here if the slot was active. Even though we hold an
> + * exclusive lock on the database object a logical slot for that DB can
> + * still be active if it's being dropped by a backend connected to
> + * another DB or is otherwise acquired.
> + *
> + * It's an unlikely race that'll only arise from concurrent user action,
> + * so we'll just bail out.
> + */
> + if (active_pid)
> + elog(ERROR, "replication slot %s is in use by pid %d",
> + NameStr(slotname), active_pid);
> +
> + /*
> + * To avoid largely duplicating ReplicationSlotDropAcquired() or
> + * complicating it with already_locked flags for ProcArrayLock,
> + * ReplicationSlotControlLock and ReplicationSlotAllocationLock, we
> + * just release our ReplicationSlotControlLock to drop the slot.
> + *
> + * For safety we'll restart our scan from the beginning each
> + * time we release the lock.
> + */
> + LWLockRelease(ReplicationSlotControlLock);
> + ReplicationSlotDropAcquired();
> + goto restart;
> + }
> + LWLockRelease(ReplicationSlotControlLock);
> +
> + /* recompute limits once after all slots are dropped */
> + ReplicationSlotsComputeRequiredXmin(false);
> + ReplicationSlotsComputeRequiredLSN();

I was concerned for a second that we'd skip doing
ReplicationSlotsComputeRequired* if we ERROR out above - but
ReplicationSlotDropAcquired already calls these as necessary. I.e. they
should be dropped from here.

- Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2017-03-28 15:23:39 Re: GUC for cleanup indexes threshold.
Previous Message Peter Eisentraut 2017-03-28 15:16:27 Re: dblink module printing unnamed connection (with commit acaf7ccb94)