Re: Initial Schema Sync for Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Initial Schema Sync for Logical Replication
Date: 2023-03-28 09:47:27
Message-ID: CAA4eK1Ld9-5ueomE_J5CA6LfRo=wemdTrUp5qdBhRFwGT+dOUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 27, 2023 at 8:17 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Mar 24, 2023 at 11:51 PM Kumar, Sachin <ssetiya(at)amazon(dot)com> wrote:
> >
> > > From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > > I think we won't be able to use same snapshot because the transaction will
> > > > be committed.
> > > > In CreateSubscription() we can use the transaction snapshot from
> > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am not sure
> > > > about this part maybe walrcv_disconnect() calls the commits internally ?).
> > > > So somehow we need to keep this snapshot alive, even after transaction
> > > > is committed(or delay committing the transaction , but we can have
> > > > CREATE SUBSCRIPTION with ENABLED=FALSE, so we can have a restart
> > > > before tableSync is able to use the same snapshot.)
> > > >
> > >
> > > Can we think of getting the table data as well along with schema via
> > > pg_dump? Won't then both schema and initial data will correspond to the
> > > same snapshot?
> >
> > Right , that will work, Thanks!
>
> While it works, we cannot get the initial data in parallel, no?
>

Another possibility is that we dump/restore the schema of each table
along with its data. One thing we can explore is whether the parallel
option of dump can be useful here. Do you have any other ideas?

One related idea is that currently, we fetch the table list
corresponding to publications in subscription and create the entries
for those in pg_subscription_rel during Create Subscription, can we
think of postponing that work till after the initial schema sync? We
seem to be already storing publications list in pg_subscription, so it
appears possible if we somehow remember the value of copy_data. If
this is feasible then I think that may give us the flexibility to
perform the initial sync at a later point by the background worker.

> >
> > > > I think we can have same issues as you mentioned New table t1 is added
> > > > to the publication , User does a refresh publication.
> > > > pg_dump / pg_restore restores the table definition. But before
> > > > tableSync can start, steps from 2 to 5 happen on the publisher.
> > > > > 1. Create Table t1(c1, c2); --LSN: 90 2. Insert t1 (1, 1); --LSN 100
> > > > > 3. Insert t1 (2, 2); --LSN 110 4. Alter t1 Add Column c3; --LSN 120
> > > > > 5. Insert t1 (3, 3, 3); --LSN 130
> > > > And table sync errors out
> > > > There can be one more issue , since we took the pg_dump without
> > > snapshot (wrt to replication slot).
> > > >
> > >
> > > To avoid both the problems mentioned for Refresh Publication, we can do
> > > one of the following: (a) create a new slot along with a snapshot for this
> > > operation and drop it afterward; or (b) using the existing slot, establish a
> > > new snapshot using a technique proposed in email [1].
> > >
> >
> > Thanks, I think option (b) will be perfect, since we don’t have to create a new slot.
>
> Regarding (b), does it mean that apply worker stops streaming,
> requests to create a snapshot, and then resumes the streaming?
>

Shouldn't this be done by the backend performing a REFRESH publication?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2023-03-28 09:50:07 Re: Remove 'htmlhelp' documentat format (was meson documentation build open issues)
Previous Message Peter Eisentraut 2023-03-28 09:46:41 Re: Remove 'htmlhelp' documentat format (was meson documentation build open issues)