Re: Logical Replication WIP

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Steve Singer <steve(at)ssinger(dot)info>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: Logical Replication WIP
Date: 2016-12-13 00:33:25
Message-ID: 20161213003325.npmnfx2znwnoffkf@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

HJi,

On 2016-12-12 09:18:48 -0500, Peter Eisentraut wrote:
> On 12/8/16 4:10 PM, Petr Jelinek wrote:
> > On 08/12/16 20:16, Peter Eisentraut wrote:
> >> On 12/6/16 11:58 AM, Peter Eisentraut wrote:
> >>> On 12/5/16 6:24 PM, Petr Jelinek wrote:
> >>>> I think that the removal of changes to ReplicationSlotAcquire() that you
> >>>> did will result in making it impossible to reacquire temporary slot once
> >>>> you switched to different one in the session as the if (active_pid != 0)
> >>>> will always be true for temp slot.
> >>>
> >>> I see. I suppose it's difficult to get a test case for this.
> >>
> >> I created a test case, saw the error of my ways, and added your code
> >> back in. Patch attached.
> >>
> >
> > Hi,
> >
> > I am happy with this version, thanks for moving it forward.
>
> committed

Hm.

/*
+ * Cleanup all temporary slots created in current session.
+ */
+void
+ReplicationSlotCleanup()

I'd rather see a (void) there. The prototype has it, but still.

+
+ /*
+ * No need for locking as we are only interested in slots active in
+ * current process and those are not touched by other processes.

I'm a bit suspicious of this claim. Without a memory barrier you could
actually look at outdated versions of active_pid. In practice there's
enough full memory barriers in the slot creation code that it's
guaranteed to not be the same pid from before a wraparound though.

I think that doing iterations of slots without
ReplicationSlotControlLock makes things more fragile, because suddenly
assumptions that previously held aren't true anymore. E.g. factually
/*
* The slot is definitely gone. Lock out concurrent scans of the array
* long enough to kill it. It's OK to clear the active flag here without
* grabbing the mutex because nobody else can be scanning the array here,
* and nobody can be attached to this slot and thus access it without
* scanning the array.
*/
is now simply not true anymore. It's probably not harmfully broken, but
at least you've changed the locking protocol without adapting comments.

/*
- * Permanently drop the currently acquired replication slot which will be
- * released by the point this function returns.
+ * Permanently drop the currently acquired replication slot.
*/
static void
ReplicationSlotDropAcquired(void)

Isn't that actually removing interesting information? Yes, the comment's
been moved to ReplicationSlotDropPtr(), but that routine is an internal
one...

@@ -810,6 +810,9 @@ ProcKill(int code, Datum arg)
if (MyReplicationSlot != NULL)
ReplicationSlotRelease();

+ /* Also cleanup all the temporary slots. */
+ ReplicationSlotCleanup();
+

So we now have exactly this code in several places. Why does a
generically named Cleanup routine not also deal with a currently
acquired slot? Right now it'd be more appropriately named
ReplicationSlotDropTemporary() or such.

@@ -1427,13 +1427,14 @@ pg_replication_slots| SELECT l.slot_name,
l.slot_type,
l.datoid,
d.datname AS database,
+ l.temporary,
l.active,
l.active_pid,
l.xmin,
l.catalog_xmin,
l.restart_lsn,
l.confirmed_flush_lsn
- FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn)
+ FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn)
LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
pg_roles| SELECT pg_authid.rolname,
pg_authid.rolsuper,

If we start to expose this, shouldn't we expose the persistency instead
(i.e. persistent/ephemeral/temporary)?

new file contrib/test_decoding/sql/slot.sql
@@ -0,0 +1,20 @@
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_p', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t', 'test_decoding', true);
+
+SELECT pg_drop_replication_slot('regression_slot_p');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_p', 'test_decoding', false);
+
+-- reconnect to clean temp slots
+\c

Can we add multiple slots to clean up here? Can we also add a test for
the cleanup on error for temporary slots? E.g. something like in
ddl.sql (maybe we should actually move some of the relevant tests from
there to here).

It'd also be good to test this with physical slots?

+-- test switching between slots in a session
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding', true);
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding', true);
+SELECT * FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL);
+SELECT * FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL);

Can we actually output something? Right now this doesn't test that
much...

- Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Glukhov 2016-12-13 00:38:59 PATCH: recursive json_populate_record()
Previous Message Petr Jelinek 2016-12-13 00:05:08 Re: pgsql: Add support for temporary replication slots