Re: Dealing with logical replication

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Dealing with logical replication
Date: 2017-07-20 05:21:11
Message-ID: CAMsr+YFMcHSN_SbrT0dXHhvrzCKHhiU22pGQwYxTO9pndyg4kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19 July 2017 at 16:34, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> wrote:

> Now that we are going to have logical replication in PostgreSQL 10, I
> have started thinking how Pgpool-II can deal with it. For example, the
> logical replication does not replicate DDLs. Isn't it convenient for
> users to do it automatically in Pgpool-II? Or even doing it for
> TRUNCATE?
>
> Or are they against the design philosophy of the logical replication?
>

(Disclaimer - Petr Jelinek and Peter Eisentraut were the main ones working
on in in-core logical rep, and I haven't had time to play with it much).

TL;DR: a pooler can only solve a limited subset of the problem, fairly
unreliably, and this is really something that needs further work in core.

Not replicating TRUNCATE and schema changes is more an implementation
limitation than anything else. It's desirable to get to the point where
logical replication can transparently replicate DDL. There are some hurdles
for doing so, but it'll be possible to get something in place that's good
enough when time/release progress permits.

Similarly, with TRUNCATE, AFAIK support just didn't make the cut for pg10.

A pooler could well help in the mean time, but you have to consider
ordering with care. For example, given "U" upstream, "D" downstream:

U: CREATE TABLE x (a integer, b integer);
D: CREATE TABLE x (a); -- user script syncs
U: INSERT INTO x (a,b) VALUES (1,0);
D: [applies INSERT 1,0]
U: INSERT INTO x (a,b) VALUES (2,0);
U: ALTER TABLE x DROP COLUMN b;
D: ALTER TABLE x DROP COLUMN b; -- user script syncs
U: INSERT INTO x (a) VALUES (3);
D: [ERROR on INSERT of 2,0: no column 'b']

Because the DDL here is transported out of band vs the row data, you can
easily create situations where the schema change is applied before the last
committed-but-not-yet-replicated data from the upstream that was based on
the old schema.

To achieve correct ordering, the simplest approach is to record DDL in a
table when you perform it on the upstream, and replay it when you see rows
in that table appear on the downstream. You know it's safe to replay it
now. This is the essence of what BDR and pglogical do with their DDL
replication, but they handle DDL in the middle of transactions that also
make row data changes by intercepting writes to the queue table and
performing the DDL at the exact point in the transaction where it happened
on the upstream. I don't think that's possible with the in-core logical
replication yet, and certainly not something a pooler can do.

To do it externally, you have to take note of when a schema change happened
on the upstream and apply it on the downstream at or after the point where
the downstream has replayed and confirmed up to the upstream lsn where the
schema change happened. Then apply the schema change.

A pooler trying to help here must also be very aware of the impact of
multi-statements. If you send a single simple query message with a mixture
of schema change commands and normal DML, you probably don't want to repeat
the DML on downstream nodes or you'll get duplicate rows etc. But ... by
unless it embeds transaction control commands, a simple query message
executes in a single implicit transaction, so if you extract just the DDL
you'll again have ordering issues of upstream vs downstream.

There are even a variety of difficulties to overcome with doing it in core:
event triggers don't capture ddl command text and have no facility to turn
the internal command representation back into SQL command text, nor do we
have any way to turn the internal representation back into a parsenode tree
for execution on a downstream's standard_ProcessUtility. However, we can
capture raw command text with ProcessUtility_hook now that we have
byte-ranges for the query-parts of a multi-part query (yay!), and that
works well enough if you also capture the active search_path and apply with
the same search_path. It can match the wrong object if extra objects with
the same name are present earlier on the search_path on the downstream than
on the upstream, so it's not ideal, but that's a weird corner case.

If we had a hook in the logical apply worker's insert or wal-message
routines it'd be possible to write an extension to do this for pg10, but
AFAICS we don't.

So schema changes in logical replication currently require more care than
in physical replication.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-07-20 05:32:47 Re: [TRAP: FailedAssertion] causing server to crash
Previous Message Julien Rouhaud 2017-07-20 03:49:40 Re: psql's \r broken since e984ef5861d