Re: Logical Replication WIP

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Steve Singer <steve(at)ssinger(dot)info>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Logical Replication WIP
Date: 2016-10-24 13:22:39
Message-ID: 7ce7eb9c-2011-2f85-54b5-480003388e41@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

attached is updated version of the patch.

There are quite a few improvements and restructuring, I fixed all the
bugs and basically everything that came up from the reviews and was
agreed on. There are still couple of things missing, ie column type
definition in protocol and some things related to existing data copy.

The biggest changes are:

I added one more prerequisite patch (the first one) which adds ephemeral
slots (or well implements UI on top of the code that was mostly already
there). The ephemeral slots are different in that they go away either on
error or when session is closed. This means the initial data sync does
not have to worry about cleaning up the slots after itself. I think this
will be useful in other places as well (for example basebackup). I
originally wanted to call them temporary slots in the UI but since the
behavior is bit different from temp tables I decided to go with what the
underlying code calls them in UI as well.

I also split out the libpqwalreceiver rewrite to separate patch which
does just the re-architecture and does not really add new functionality.
And I did the re-architecture bit differently based on the review.

There is now new executor module in execReplication.c, no new nodes but
several utility commands. I moved there the tuple lookup functions from
apply and also wrote new interfaces for doing inserts/updates/deletes to
a table including index updates and constraints checks and trigger
execution but without the need for the whole nodeModifyTable handling.

What I also did when rewriting this is implementation of the tuple
lookup also using sequential scan so that we can support replica
identity full properly. This greatly simplified the dependency handling
between pkeys and publications (by removing it completely ;) ). Also
when there is replica identity full and the table has primary key, the
code will use the primary key even though it's not replica identity
index to lookup the row so that users who want to combine the logical
replication with some kind of other system that requires replica
identity full (ie auditing) they still get usable experience.

The way copy is done was heavily reworked. For one it uses the ephemeral
slots mentioned above. But more importantly there are now new custom
commands anymore. Instead the walsender accepts some SQL, currently
allowed are BEGIN, ROLLBACK, SELECT and COPY. The way that is
implemented is probably not perfect and it could use look from somebody
who knows bison better. How it works is that if the command sent to
walsender starts with one of the above mentioned keywords the walsender
parser passes the whole query back and it's passed then to
exec_simple_query. The main reason why we need BEGIN is so that the COPY
can use the snapshot exported by the slot creation so that there is
synchronization point when there are concurrent writes. This probably
needs more discussion.

I also tried to keep the naming more consistent so cleaned up all
mentions of "provider" and changed them to "publisher" and also
publications don't mention that they "replicate", they just "publish"
now (that has effect on DDL syntax as well).

Some things that were discussed in the reviews that I didn't implement
knowingly include:

Removal of the Oid in the pg_publication_rel, that's mainly because it
would need significant changes to pg_dump which assumes everything
that's dumped has Oid and it's not something that seems worth it as part
of this patch.

Also didn't do the outfuncs, it's unclear to me what are the rules there
as the only DDL statement there is CreateStmt atm.

There are still few TODOs:

Type info for columns. My current best idea is to write typeOid and
typemod in the relation message and add another message (type message)
that describes the type which will skip the built-in types (as we can't
really remap those without breaking a lot of software so they seem safe
to skip). I plan to do this soonish barring objections.

Removal of use of replication origin in the table sync worker.

Parallelization of the initial copy. And ability to resync (do new copy)
of a table. These two mainly wait for agreement over how the current way
of doing copy should work.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Add-user-interface-for-EPHEMERAL-replication-slots.patch.gz application/gzip 6.1 KB
0002-Make-libpqwalreceiver-reentrant.patch.gz application/gzip 6.8 KB
0003-Add-PUBLICATION-catalogs-and-DDL.patch.gz application/gzip 28.3 KB
0004-Add-SUBSCRIPTION-catalog-and-DDL.patch.gz application/gzip 24.7 KB
0005-Define-logical-replication-protocol-and-output-plugi.patch.gz application/gzip 12.2 KB
0006-Add-logical-replication-workers.patch.gz application/gzip 39.8 KB
0007-Logical-replication-support-for-initial-data-copy.patch.gz application/gzip 28.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2016-10-24 13:37:59 Re: Renaming of pg_xlog and pg_clog
Previous Message Tom Lane 2016-10-24 13:20:08 Re: issue with track_commit_timestamp and server restart