Re: pglogical - logical replication contrib module

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pglogical - logical replication contrib module
Date: 2016-02-17 08:24:46
Message-ID: 56C42E4E.9000400@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Craig,

Thanks for your explanation. I have to agree with your arguments that in
general case replication of DDL statement using logical decoding seems
to be problematic. But we are mostly considering logical decoding in
quite limited context: replication between two identical Postgres
database nodes (multimaster).

Do you think that it in this case replication of DLL can be done as
sequence of low level operations with system catalog tables
including manipulation with locks? So in your example with ALTER TABLE
statement, can we correctly replicate it to other nodes
as request to set exclusive lock + some manipulations with catalog
tables and data table itself?
If so, instead of full support of DDL in logical decoding we can only:

1. Add option whether to include operations on system catalog tables in
logical replication or not.
2. Make it possible to replicate lock requests (can be useful not only
for DDLs)

I looked how DDL was implemented in BDR and did it in similar way in our
multimaster.
But it is awful: we need to have two different channels for propagating
changes. Additionally, in multimaster we want to enforce cluster wide
ACID. It certainly includes operations with metadata. It will be very
difficult to implement if replication of DML and DDL is done in two
different ways...

Let me ask one more question concerning logical replication: how
difficult it will be from your point of view to support two phase commit
in logical replication? Are there some principle problems?

Thanks in advance,
Konstantin

On 17.02.2016 04:33, Craig Ringer wrote:
> On 17 February 2016 at 00:54, Oleg Bartunov <obartunov(at)gmail(dot)com
> <mailto:obartunov(at)gmail(dot)com>> wrote:
>
>
> DDL support is what it's missed for now.
>
>
> TBH, based on experience with DDL replication and deparse in BDR, it's
> going to be missing for a while yet too, or at least not
> comprehensively present without caveats or exceptions.
>
>
> Some DDL operations don't translate well to a series of replicatable
> actions. The case I hit the most is
>
> ALTER TABLE mytable ADD COLUMN somecolumn sometype NOT NULL DEFAULT
> some_function();
>
> This is executed (simplified) by taking an ACCESS EXCLUSIVE lock,
> changing the catalogs but not making the changes visible yet,
> rewriting the table, and committing to make the rewritten table and
> the catalog changes visible.
>
> That won't work well with logical replication. We currently capture
> DDL with event triggers and log them to a table for later logical
> decoding and replay - that's the "recognised" way. The trouble being
> that replaying that statement will result in an unnecessary full table
> rewrite on the downstream. Then we have to decode and send stream of
> changes to a table called pg_temp_<oid_of_mytable>, truncate the copy
> of mytable on the downstream that we just rewrote and apply those rows
> instead.
>
> Of course all that only works sensibly if you have exactly one
> upstream and the downstream copy of the table is treated as (or
> enforced as) read-only.
>
> Improving this probably needs DDL deparse to be smarter. Rather than
> just emitting something that can be reconstructed into the SQL text of
> the DDL it needs to emit one or more steps that are semantically the
> same but allow us to skip the rewrite. Along the lines of:
>
> * ALTER TABLE mytable ADD COLUMN somecolumn sometype;
> * ALTER TABLE mytable ALTER COLUMN somecolumn DEFAULT some_function();
> * <wait for rewrite data for mytable>
> * ALTER TABLE mytable ALTER COLUMN somecolumn NOT NULL;
>
> Alternately the downstream would need a hook that lets it intercept
> and prevent table rewrites caused by ALTER TABLE and similar. So it
> can instead just do a truncate and wait for the new rows to come from
> the master.
>
> Note that all this means the standby has to hold an ACCESS EXCLUSIVE
> lock on the table during all of replay. That shouldn't be necessary,
> all we really need is an EXCLUSIVE lock since concurrent SELECTs are
> fine. No idea how to do that.
>
>
>
> Deparse is also just horribly complicated to get right. There are so
> many clauses and subclauses and variants of statements. Each of which
> must be perfect.
>
>
>
> Not everything has a simple and obvious mapping on the downstream side
> either. TRUNCATE ... CASCADE is the obvious one. You do a cascade
> truncate on the master - do you want that to replicate as a cascaded
> truncate on the replica, or a truncate of only those tables that
> actually got truncated on the master? If the replica has additional
> tables with FKs pointing at tables replica the TRUNCATE would truncate
> those too if you replicate it as CASCADE; if you don't the truncate
> will fail instead. Really, both are probably wrong as far as the user
> is concerned, but we can't truncate just the tables truncated on the
> master, ignore the FK relationships, and leave dangling FK references
> either.
>
>
> All this means that DDL replication is probably only going to make
> sense in scenarios where there's exactly one master and the replica
> obeys some rules like "don't create FKs pointing from non-replicated
> tables to tables replicated from somewhere else". A concept we
> currently have no way to express or enforce like we do
> persistent-to-UNLOGGED FKs.
>
>
>
> Then there's global objects. Something as simple as:
>
> CREATE ROLE fred;
>
> CREATE TABLE blah(...) OWNER fred;
>
> will break replication because we only see the CREATE TABLE, not the
> CREATE ROLE. If we instead replayed the CREATE ROLE and there were
> multiple connections between different DBs on an upstream and
> downstream apply would fail on all but one. But we can't anyway since
> there's no way to capture that CREATE ROLE from any DB except the one
> it was executed in, which might not even be one of the ones doing
> replication.
>
> I strongly suspect we'll need logical decoding to be made aware of
> such global DDL and decode it from the WAL writes to the system
> catalogs. Which will be fun - but at least modifications to the shared
> catalogs are a lot simpler than the sort of gymnastics done by ALTER
> TABLE, etc.
>
>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-02-17 08:26:29 Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl
Previous Message Magnus Hagander 2016-02-17 08:24:26 Re: commitfest application doesn't see new patch