Re: pglogical - logical replication contrib module

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pglogical - logical replication contrib module
Date: 2016-02-17 10:39:47
Message-ID: 56C44DF3.6010306@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ok, what about the following plan:

1. Support custom WAL records (as far as I know 2ndQuadrant has such patch).
2. Add one more function to logical decoding allowing to deal with
custom records.

So the idea is that we somehow record DDL in WAL (for example using
executor hook),
then them are proceeded using logical decoding, calling special logical
deocding plugin function to handle this records.
For example we can store DDL in WAL just as SQL statements and so easily
replay them.

In this case DDL will be replicated using the same mechanism and through
the same channel as DML.

On 17.02.2016 12:16, Craig Ringer wrote:
> On 17 February 2016 at 16:24, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>
> Thanks for your explanation. I have to agree with your arguments
> that in general case replication of DDL statement using logical
> decoding seems to be problematic. But we are mostly considering
> logical decoding in quite limited context: replication between two
> identical Postgres database nodes (multimaster).
>
>
> Yep, much like BDR. Where all this infrastructure came from and is/was
> aimed at.
>
> Do you think that it in this case replication of DLL can be done
> as sequence of low level operations with system catalog tables
> including manipulation with locks?
>
>
> No.
>
> For one thing logical decoding doesn't see catalog tuple changes right
> now. Though I imagine that could be changed easily enough.
>
> More importantly - oids. You add a column to a table:
>
> ALTER TABLE mytable ADD COLUMN mycolumn some_type UNIQUE NOT NULL
> DEFAULT some_function()
>
> This writes to catalogs including:
>
> pg_attribute
> pg_constraint
> pg_index
> pg_class (for the index relation)
>
> ... probably more. It also refers to pg_class (for the definition of
> mytable), pg_type (definition of some_type), pg_proc (definition of
> some_function), the b-tree operator class for some_type in pg_opclass,
> the b-tree indexam in pg_am, ... more.
>
> Everything is linked by oids, and the oids are all node local. You
> can't just blindly re-use them. If "some_type" is hstore, the oid of
> hstore in pg_type might be different on the upstream and downstream.
> The only exception is the oids of built-in types and even then that's
> not guaranteed across major versions.
>
> So if you blindly replicate catalog row changes you'll get a horrible
> mess. That's before considering a table's relfilenode, which is
> initially the same as its oid, but subject to change if truncated or
> rewritten.
>
> To even begin to do this half-sanely you'd have to maintain a mapping
> of upstream object oids->names on the downstream, with invalidations
> replicated from the upstream. That's only the beginning. There's
> handling of extensions and lots more fun.
>
> So in your example with ALTER TABLE statement, can we correctly
> replicate it to other nodes
> as request to set exclusive lock + some manipulations with catalog
> tables and data table itself?
>
>
> Nope. No hope, not unless "some manipulations with catalog tables and
> data table its self" is a lot more comprehensive than I think you mean.
>
> 1. Add option whether to include operations on system catalog
> tables in logical replication or not.
>
>
> I would like to have this anyway.
>
> 2. Make it possible to replicate lock requests (can be useful not
> only for DDLs)
>
>
> I have no idea how you'd even begin to do that.
>
> I looked how DDL was implemented in BDR and did it in similar way
> in our multimaster.
> But it is awful: we need to have two different channels for
> propagating changes.
>
>
> Yeah, it's not beautiful, but maybe you misunderstood something? The
> DDL is written to a table, and that table's changes are replayed along
> with everything else. It's consistent and keeps DDL changes as part of
> the xact that performed them. Maybe you misunderstood how it works in
> BDR and missed the indirection via a table?
>
> Additionally, in multimaster we want to enforce cluster wide ACID.
> It certainly includes operations with metadata. It will be very
> difficult to implement if replication of DML and DDL is done in
> two different ways...
>
>
> That's pretty much why BDR does it this way, warts and all. Though it
> doesn't offer cluster-wide ACID it does need atomic commit of xacts
> that may contain DML, DDL, or some mix of the two.
>
> Let me ask one more question concerning logical replication: how
> difficult it will be from your point of view to support two phase
> commit in logical replication? Are there some principle problems?
>
>
> I haven't looked closely yet. Andres will know more.
>
> I very, very badly want to be able to decode 2PC prepared xacts myself.
>
> The main issue I'm aware of is locking - specifically the inability to
> impersonate another backend and treat locks held by that backend
> (which might be a fake backend for a pg_prepared_xacts entry) as held
> by ourselves for the purpose of being able to access relations, etc.
>
> The work Robert is doing on group locking looks absolutely ideal for
> this, but won't land before 9.7.
>
> (Closely related, I also want to be able to hook into commit and
> transform a normal COMMIT into a PREPARE TRANSACTION, <do some stuff>,
> COMMIT PREPARED with the application that issued the commit none the
> wiser. This will allow pessimistic 2PC-based conflict handling for
> must-succeed xacts like those that do DDL).
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stas Kelvich 2016-02-17 11:14:57 Re: tsearch_extras extension
Previous Message Michael Paquier 2016-02-17 10:32:06 Re: exposing pg_controldata and pg_config as functions