Re: Initial Schema Sync for Logical Replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Initial Schema Sync for Logical Replication
Date: 2023-03-30 13:11:50
Message-ID: CAD21AoAnu8TM1YLziSvgyypTiezek405U5vD340HOsp65Rp+gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 30, 2023 at 12:18 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Mar 29, 2023 at 7:57 PM Kumar, Sachin <ssetiya(at)amazon(dot)com> wrote:
> >
> > > > > > From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > > > > > I think we won't be able to use same snapshot because the
> > > > > > > transaction will be committed.
> > > > > > > In CreateSubscription() we can use the transaction snapshot from
> > > > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am
> > > > > > > not sure about this part maybe walrcv_disconnect() calls the commits
> > > internally ?).
> > > > > > > So somehow we need to keep this snapshot alive, even after
> > > > > > > transaction is committed(or delay committing the transaction ,
> > > > > > > but we can have CREATE SUBSCRIPTION with ENABLED=FALSE, so we
> > > > > > > can have a restart before tableSync is able to use the same
> > > > > > > snapshot.)
> > > > > > >
> > > > > >
> > > > > > Can we think of getting the table data as well along with schema
> > > > > > via pg_dump? Won't then both schema and initial data will
> > > > > > correspond to the same snapshot?
> > > > >
> > > > > Right , that will work, Thanks!
> > > >
> > > > While it works, we cannot get the initial data in parallel, no?
> > > >
> >
> > I was thinking each TableSync process will call pg_dump --table, This way if we have N
> > tableSync process, we can have N pg_dump --table=table_name called in parallel.
> > In fact we can use --schema-only to get schema and then let COPY take care of data
> > syncing . We will use same snapshot for pg_dump as well as COPY table.
>
> How can we postpone creating the pg_subscription_rel entries until the
> tablesync worker starts and does the schema sync? I think that since
> pg_subscription_rel entry needs the table OID, we need either to do
> the schema sync before creating the entry (i.e, during CREATE
> SUBSCRIPTION) or to postpone creating entries as Amit proposed[1]. The
> apply worker needs the information of tables to sync in order to
> launch the tablesync workers, but it needs to create the table schema
> to get that information.

For the above reason, I think that step 6 of the initial proposal won't work.

If we can have the tablesync worker create an entry of
pg_subscription_rel after creating the table, it may give us the
flexibility to perform the initial sync. One idea is that we add a
relname field to pg_subscription_rel so that we can create entries
with relname instead of OID if the table is not created yet. Once the
table is created, we clear the relname field and set the OID of the
table instead. It's not an ideal solution but we might make it simpler
later.

Assuming that it's feasible, I'm considering another approach for the
initial sync in order to address the concurrent DDLs.

The basic idea is to somewhat follow how pg_dump/restore to
dump/restore the database data. We divide the synchronization phase
(including both schema and data) up into three phases: pre-data,
table-data, post-data. These mostly follow the --section option of
pg_dump.

1. The backend process performing CREATE SUBSCRIPTION creates the
subscription but doesn't create pg_subscription_rel entries yet.

2. Before starting the streaming, the apply worker fetches the table
list from the publisher, create pg_subscription_rel entries for them,
and dumps+restores database objects that tables could depend on but
don't depend on tables such as TYPE, OPERATOR, ACCESS METHOD etc (i.e.
pre-data).

3. The apply worker launches the tablesync workers for tables that
need to be synchronized.

There might be DDLs executed on the publisher for tables before the
tablesync worker starts. But the apply worker needs to apply DDLs for
pre-data database objects. OTOH, it can ignore DDLs for not-synced-yet
tables and other database objects such as INDEX, TRIGGER, RULE, etc
(i.e. post-data).

4. The tablesync worker creates its replication slot, dumps+restores
the table schema, update the pg_subscription_rel, and perform COPY.

These operations should be done in the same transaction.

5. After finishing COPY, the tablesync worker dumps indexes (and
perhaps constraints) of the table and creates them (which possibly
takes a long time). Then it starts to catch up, same as today. The
apply worker needs to wait for the tablesync worker to catch up.

We need to repeat these steps until we complete the initial data copy
and create indexes for all tables, IOW until all pg_subscription_rel
status becomes READY.

6. If the apply worker confirms all tables are READY, it starts
another sync worker who is responsible for the post-data database
objects such as TRIGGER, RULE, POLICY etc (i.e. post-data).

While the sync worker is starting up or working, the apply worker
applies changes for pre-data database objects as well as READY tables.

7. Similar to the tablesync worker, this sync worker creates its
replication slot and sets the returned LSN somewhere, say
pg_subscription.

8. The sync worker dumps and restores these objects. Which could take
a time since it would need to create FK constraints. Then it starts to
catch up if the apply worker is ahead. The apply worker waits for the
sync worker to catch up.

9. Once the sync worker catches up, the apply worker starts applying
changes for all database objects.

IIUC with this approach, we can resolve the concurrent DDL problem
Sachin mentioned, and indexes (and constraints) are created after the
initial data copy.

The procedures are still very complex and not fully considered yet but
I hope there are some useful things at least for discussion.

Probably we can start with supporting only tables. In this case, we
would not need the post-data phase (i.e. step 6-9). It seems to me
that we need to have the subscription state somewhere (maybe
pg_subscription) so that the apply worker figure out the next step.
Since we need to dump and restore different objects on different
timings, we probably cannot directly use pg_dump/pg_restore. I've not
considered how the concurrent REFRESH PUBLICATION works.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-03-30 13:40:27 Re: Schema variables - new implementation for Postgres 15
Previous Message Robert Haas 2023-03-30 12:58:20 Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security