RE: Initial Schema Sync for Logical Replication

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Sachin Kumar <ssetiya(at)amazon(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Initial Schema Sync for Logical Replication
Date: 2023-03-24 11:57:00
Message-ID: OS0PR01MB5716088E497BDCBCED7FC3DA94849@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, March 24, 2023 12:02 AM Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> On Thu, Mar 23, 2023, at 8:44 AM, Amit Kapila wrote:
> > On Thu, Mar 23, 2023 at 2:48 AM Euler Taveira <mailto:euler(at)eulerto(dot)com> wrote:
> > >
> > > On Tue, Mar 21, 2023, at 8:18 AM, Amit Kapila wrote:
> > >
> > > Now, how do we avoid these problems even if we have our own version of
> > > functionality similar to pg_dump for selected objects? I guess we will
> > > face similar problems. If so, we may need to deny schema sync in any
> > > such case.
> > >
> > > There are 2 approaches for initial DDL synchronization:
> > >
> > > 1) generate the DDL command on the publisher, stream it and apply it as-is on
> > > the subscriber;
> > > 2) generate a DDL representation (JSON, for example) on the publisher, stream
> > > it, transform it into a DDL command on subscriber and apply it.
> > >
> > > The option (1) is simpler and faster than option (2) because it does not
> > > require an additional step (transformation). However, option (2) is more
> > > flexible than option (1) because it allow you to create a DDL command even if a
> > > feature was removed from the subscriber and the publisher version is less than
> > > the subscriber version or a feature was added to the publisher and the
> > > publisher version is greater than the subscriber version.
> > >
> >
> > Is this practically possible? Say the publisher has a higher version
> > that has introduced a new object type corresponding to which it has
> > either a new catalog or some new columns in the existing catalog. Now,
> > I don't think the older version of the subscriber can modify the
> > command received from the publisher so that the same can be applied to
> > the subscriber because it won't have any knowledge of the new feature.
> > In the other case where the subscriber is of a newer version, we
> > anyway should be able to support it with pg_dump as there doesn't
> > appear to be any restriction with that, am, I missing something?
> I think so (with some limitations). Since the publisher knows the subscriber
> version, publisher knows that the subscriber does not contain the new object
> type then publisher can decide if this case is critical (and reject the
> replication) or optional (and silently not include the feature X -- because it
> is not essential for logical replication). If required, the transformation
> should be done on the publisher.

I am not if it's feasible to support the use case the replicate DDL to old
subscriber.

First, I think the current publisher doesn't know the version number of
client(subscriber) so we need to check the feasibility of same. Also, having
client's version number checks doesn't seem to be a good idea.

Besides, I thought about the problems that will happen if we try to support
replicating New PG to older PG. The following examples assume that we support the
DDL replication in the mentioned PG.

1) Assume we want to replicate from a newer PG to a older PG where partition
table has not been introduced. I think even if the publisher is aware of
that, it doesn't have a good way to transform the partition related command,
maybe one could say we can transform that to inherit table, but I feel that
introduces too much complexity.

2) Another example is generated column. To replicate the newer PG which has
this feature to a older PG without this. I am concerned that is there a way
to transform this without causing inconsistent behavior.

Even if we decide to simply skip sending such unsupported commands or skip
applying them, then it's likely that the following dml replication will cause
data inconsistency.

So, it seems we cannot completely support this use case, there would be some
limitations. Personally, I am not sure if it's worth introducing complexity to
support it partially.

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-03-24 12:08:13 Re: pg_bsd_indent vs vpath
Previous Message Hayato Kuroda (Fujitsu) 2023-03-24 11:27:48 RE: PGdoc: add missing ID attribute to create_subscription.sgml