Quick Links

Re: Initial Schema Sync for Logical Replication

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Kumar, Sachin" <ssetiya(at)amazon(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Initial Schema Sync for Logical Replication
Date:	2023-03-29 15:18:04
Message-ID:	CAD21AoANtgtqSavuhCn6Q3Qigogb05tKQ6mAQgVKhfZ0ysFrRw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Mar 29, 2023 at 7:57 PM Kumar, Sachin <ssetiya(at)amazon(dot)com> wrote:
>
> > > > > From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > > > > I think we won't be able to use same snapshot because the
> > > > > > transaction will be committed.
> > > > > > In CreateSubscription() we can use the transaction snapshot from
> > > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am
> > > > > > not sure about this part maybe walrcv_disconnect() calls the commits
> > internally ?).
> > > > > > So somehow we need to keep this snapshot alive, even after
> > > > > > transaction is committed(or delay committing the transaction ,
> > > > > > but we can have CREATE SUBSCRIPTION with ENABLED=FALSE, so we
> > > > > > can have a restart before tableSync is able to use the same
> > > > > > snapshot.)
> > > > > >
> > > > >
> > > > > Can we think of getting the table data as well along with schema
> > > > > via pg_dump? Won't then both schema and initial data will
> > > > > correspond to the same snapshot?
> > > >
> > > > Right , that will work, Thanks!
> > >
> > > While it works, we cannot get the initial data in parallel, no?
> > >
>
> I was thinking each TableSync process will call pg_dump --table, This way if we have N
> tableSync process, we can have N pg_dump --table=table_name called in parallel.
> In fact we can use --schema-only to get schema and then let COPY take care of data
> syncing . We will use same snapshot for pg_dump as well as COPY table.

How can we postpone creating the pg_subscription_rel entries until the
tablesync worker starts and does the schema sync? I think that since
pg_subscription_rel entry needs the table OID, we need either to do
the schema sync before creating the entry (i.e, during CREATE
SUBSCRIPTION) or to postpone creating entries as Amit proposed[1]. The
apply worker needs the information of tables to sync in order to
launch the tablesync workers, but it needs to create the table schema
to get that information.

Regards,

[1] https://www.postgresql.org/message-id/CAA4eK1Ld9-5ueomE_J5CA6LfRo%3DwemdTrUp5qdBhRFwGT%2BdOUw%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

RE: Initial Schema Sync for Logical Replication at 2023-03-29 10:57:49 from Kumar, Sachin

Responses

Re: Initial Schema Sync for Logical Replication at 2023-03-30 13:11:50 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jelte Fennema	2023-03-29 15:58:51	Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Previous Message	Jacob Champion	2023-03-29 15:10:18	Re: zstd compression for pg_dump