Re: Initial Schema Sync for Logical Replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Initial Schema Sync for Logical Replication
Date: 2023-03-29 15:18:04
Message-ID: CAD21AoANtgtqSavuhCn6Q3Qigogb05tKQ6mAQgVKhfZ0ysFrRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 29, 2023 at 7:57 PM Kumar, Sachin <ssetiya(at)amazon(dot)com> wrote:
>
> > > > > From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > > > > I think we won't be able to use same snapshot because the
> > > > > > transaction will be committed.
> > > > > > In CreateSubscription() we can use the transaction snapshot from
> > > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am
> > > > > > not sure about this part maybe walrcv_disconnect() calls the commits
> > internally ?).
> > > > > > So somehow we need to keep this snapshot alive, even after
> > > > > > transaction is committed(or delay committing the transaction ,
> > > > > > but we can have CREATE SUBSCRIPTION with ENABLED=FALSE, so we
> > > > > > can have a restart before tableSync is able to use the same
> > > > > > snapshot.)
> > > > > >
> > > > >
> > > > > Can we think of getting the table data as well along with schema
> > > > > via pg_dump? Won't then both schema and initial data will
> > > > > correspond to the same snapshot?
> > > >
> > > > Right , that will work, Thanks!
> > >
> > > While it works, we cannot get the initial data in parallel, no?
> > >
>
> I was thinking each TableSync process will call pg_dump --table, This way if we have N
> tableSync process, we can have N pg_dump --table=table_name called in parallel.
> In fact we can use --schema-only to get schema and then let COPY take care of data
> syncing . We will use same snapshot for pg_dump as well as COPY table.

How can we postpone creating the pg_subscription_rel entries until the
tablesync worker starts and does the schema sync? I think that since
pg_subscription_rel entry needs the table OID, we need either to do
the schema sync before creating the entry (i.e, during CREATE
SUBSCRIPTION) or to postpone creating entries as Amit proposed[1]. The
apply worker needs the information of tables to sync in order to
launch the tablesync workers, but it needs to create the table schema
to get that information.

Regards,

[1] https://www.postgresql.org/message-id/CAA4eK1Ld9-5ueomE_J5CA6LfRo%3DwemdTrUp5qdBhRFwGT%2BdOUw%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema 2023-03-29 15:58:51 Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Previous Message Jacob Champion 2023-03-29 15:10:18 Re: zstd compression for pg_dump