Re: Initial Schema Sync for Logical Replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Initial Schema Sync for Logical Replication
Date: 2023-04-06 13:26:33
Message-ID: CAD21AoAMwqcd55B7zdRrSptB1=jzzGs2L5aaLKvL=Q6mD8qqbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 30, 2023 at 10:11 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Mar 30, 2023 at 12:18 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 29, 2023 at 7:57 PM Kumar, Sachin <ssetiya(at)amazon(dot)com> wrote:
> > >
> > > > > > > From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > > > > > > I think we won't be able to use same snapshot because the
> > > > > > > > transaction will be committed.
> > > > > > > > In CreateSubscription() we can use the transaction snapshot from
> > > > > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am
> > > > > > > > not sure about this part maybe walrcv_disconnect() calls the commits
> > > > internally ?).
> > > > > > > > So somehow we need to keep this snapshot alive, even after
> > > > > > > > transaction is committed(or delay committing the transaction ,
> > > > > > > > but we can have CREATE SUBSCRIPTION with ENABLED=FALSE, so we
> > > > > > > > can have a restart before tableSync is able to use the same
> > > > > > > > snapshot.)
> > > > > > > >
> > > > > > >
> > > > > > > Can we think of getting the table data as well along with schema
> > > > > > > via pg_dump? Won't then both schema and initial data will
> > > > > > > correspond to the same snapshot?
> > > > > >
> > > > > > Right , that will work, Thanks!
> > > > >
> > > > > While it works, we cannot get the initial data in parallel, no?
> > > > >
> > >
> > > I was thinking each TableSync process will call pg_dump --table, This way if we have N
> > > tableSync process, we can have N pg_dump --table=table_name called in parallel.
> > > In fact we can use --schema-only to get schema and then let COPY take care of data
> > > syncing . We will use same snapshot for pg_dump as well as COPY table.
> >
> > How can we postpone creating the pg_subscription_rel entries until the
> > tablesync worker starts and does the schema sync? I think that since
> > pg_subscription_rel entry needs the table OID, we need either to do
> > the schema sync before creating the entry (i.e, during CREATE
> > SUBSCRIPTION) or to postpone creating entries as Amit proposed[1]. The
> > apply worker needs the information of tables to sync in order to
> > launch the tablesync workers, but it needs to create the table schema
> > to get that information.
>
> For the above reason, I think that step 6 of the initial proposal won't work.
>
> If we can have the tablesync worker create an entry of
> pg_subscription_rel after creating the table, it may give us the
> flexibility to perform the initial sync. One idea is that we add a
> relname field to pg_subscription_rel so that we can create entries
> with relname instead of OID if the table is not created yet. Once the
> table is created, we clear the relname field and set the OID of the
> table instead. It's not an ideal solution but we might make it simpler
> later.

While writing a PoC patch, I found some difficulties in this idea.
First, I tried to add schemaname+relname to pg_subscription_rel but I
could not define the primary key of pg_subscription_rel. The primary
key on (srsubid, srrelid) doesn't work since srrelid could be NULL.
Similarly, the primary key on (srsubid, srrelid, schemaname, relname)
also doesn't work. So I tried another idea: that we generate a new OID
for srrelid and the tablesync worker will replace it with the new
table's OID once it creates the table. However, since we use srrelid
in replication slot names, changing srrelid during the initial
schema+data sync is not straightforward (please note that the slot is
created by the tablesync worker but is removed by the apply worker).
Using relname in slot name instead of srrelid is not a good idea since
it requires all pg_subscription_rel entries have relname, and slot
names could be duplicated, for example, when the relname is very long
and we cut it.

I'm trying to consider the idea from another angle: the apply worker
fetches the table list and passes the relname to the tablesync worker.
But a problem of this approach is that the table list is not
persisted. If the apply worker restarts during the initial table sync,
it could not get the same list as before.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-04-06 13:39:46 Re: Minimal logical decoding on standbys
Previous Message Drouvot, Bertrand 2023-04-06 13:03:39 Re: Minimal logical decoding on standbys