Re: Initial Schema Sync for Logical Replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Initial Schema Sync for Logical Replication
Date: 2023-07-11 06:21:21
Message-ID: CAD21AoAPCFQW87RZEvH6iL8JqrAqj48Vcdhz8mjSfbWfn2GevA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 10, 2023 at 8:06 PM Kumar, Sachin <ssetiya(at)amazon(dot)com> wrote:
>
>
>
> > From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > On Wed, Jul 5, 2023 at 7:45 AM Masahiko Sawada
> > <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Jun 19, 2023 at 5:29 PM Peter Smith <smithpb2250(at)gmail(dot)com>
> > wrote:
> > > >
> > > > Hi,
> > > >
> > > > Below are my review comments for the PoC patch 0001.
> > > >
> > > > In addition, the patch needed rebasing, and, after I rebased it
> > > > locally in my private environment there were still test failures:
> > > > a) The 'make check' tests fail but only in a minor way due to
> > > > changes colname
> > > > b) the subscription TAP test did not work at all for me -- many errors.
> > >
> > > Thank you for reviewing the patch.
> > >
> > > While updating the patch, I realized that the current approach won't
> > > work well or at least has the problem with partition tables. If a
> > > publication has a partitioned table with publish_via_root = false, the
> > > subscriber launches tablesync workers for its partitions so that each
> > > tablesync worker copies data of each partition. Similarly, if it has a
> > > partition table with publish_via_root = true, the subscriber launches
> > > a tablesync worker for the parent table. With the current design,
> > > since the tablesync worker is responsible for both schema and data
> > > synchronization for the target table, it won't be possible to
> > > synchronize both the parent table's schema and partitions' schema.
> > >
> >
> > I think one possibility to make this design work is that when publish_via_root
> > is false, then we assume that subscriber already has parent table and then
> > the individual tablesync workers can sync the schema of partitions and their
> > data.
>
> Since publish_via_partition_root is false by default users have to create parent table by themselves
> which I think is not a good user experience.

I have the same concern. I think that users normally use
publish_via_partiiton_root = false if the partitioned table on the
subscriber consists of the same set of partitions as the publisher's
ones. And such users would expect the both partitioned table and its
partitions to be synchronized.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Lepikhov 2023-07-11 06:44:39 Re: Generating code for query jumbling through gen_node_support.pl
Previous Message Masahiko Sawada 2023-07-11 05:54:26 Re: Add index scan progress to pg_stat_progress_vacuum