Quick Links

Re: Initial Schema Sync for Logical Replication

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Kumar, Sachin" <ssetiya(at)amazon(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Initial Schema Sync for Logical Replication
Date:	2023-04-03 15:15:29
Message-ID:	CAD21AoA8aBPGZ0Zd=M0qiYD1iRaACdonBy+foJTfhq_7sRCH1Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Apr 3, 2023 at 3:54 PM Kumar, Sachin <ssetiya(at)amazon(dot)com> wrote:
>
>
>
> > -----Original Message-----
> > From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> > > > I was thinking each TableSync process will call pg_dump --table,
> > > > This way if we have N tableSync process, we can have N pg_dump --
> > table=table_name called in parallel.
> > > > In fact we can use --schema-only to get schema and then let COPY
> > > > take care of data syncing . We will use same snapshot for pg_dump as well
> > as COPY table.
> > >
> > > How can we postpone creating the pg_subscription_rel entries until the
> > > tablesync worker starts and does the schema sync? I think that since
> > > pg_subscription_rel entry needs the table OID, we need either to do
> > > the schema sync before creating the entry (i.e, during CREATE
> > > SUBSCRIPTION) or to postpone creating entries as Amit proposed[1]. The
> > > apply worker needs the information of tables to sync in order to
> > > launch the tablesync workers, but it needs to create the table schema
> > > to get that information.
> >
> > For the above reason, I think that step 6 of the initial proposal won't work.
> >
> > If we can have the tablesync worker create an entry of pg_subscription_rel after
> > creating the table, it may give us the flexibility to perform the initial sync. One
> > idea is that we add a relname field to pg_subscription_rel so that we can create
> > entries with relname instead of OID if the table is not created yet. Once the
> > table is created, we clear the relname field and set the OID of the table instead.
> > It's not an ideal solution but we might make it simpler later.
> >
> > Assuming that it's feasible, I'm considering another approach for the initial sync
> > in order to address the concurrent DDLs.
> >
> > The basic idea is to somewhat follow how pg_dump/restore to dump/restore
> > the database data. We divide the synchronization phase (including both schema
> > and data) up into three phases: pre-data, table-data, post-data. These mostly
> > follow the --section option of pg_dump.
> >
> > 1. The backend process performing CREATE SUBSCRIPTION creates the
> > subscription but doesn't create pg_subscription_rel entries yet.
> >
> > 2. Before starting the streaming, the apply worker fetches the table list from the
> > publisher, create pg_subscription_rel entries for them, and dumps+restores
> > database objects that tables could depend on but don't depend on tables such as
> > TYPE, OPERATOR, ACCESS METHOD etc (i.e.
> > pre-data).
>
> We will not have slot starting snapshot, So somehow we have to get a new snapshot
> And skip all the wal_log between starting of slot and snapshot creation lsn ? .

Yes. Or we can somehow postpone creating pg_subscription_rel entries
until the tablesync workers create tables, or we request walsender to
establish a new snapshot using a technique proposed in email[1].

>
> >
> > 3. The apply worker launches the tablesync workers for tables that need to be
> > synchronized.
> >
> > There might be DDLs executed on the publisher for tables before the tablesync
> > worker starts. But the apply worker needs to apply DDLs for pre-data database
> > objects. OTOH, it can ignore DDLs for not-synced-yet tables and other database
> > objects such as INDEX, TRIGGER, RULE, etc (i.e. post-data).
> >
> > 4. The tablesync worker creates its replication slot, dumps+restores the table
> > schema, update the pg_subscription_rel, and perform COPY.
> >
> > These operations should be done in the same transaction.
>
> pg_restore wont be rollbackable, So we need to maintain states in pg_subscription_rel.

Yes. But I think it depends on how we restore them. For example, if we
have the tablesync worker somethow restore the table using a new SQL
function returning the table schema as we discussed or executing the
dump file via SPI, we can do that in the same transaction.

>
> >
> > 5. After finishing COPY, the tablesync worker dumps indexes (and perhaps
> > constraints) of the table and creates them (which possibly takes a long time).
> > Then it starts to catch up, same as today. The apply worker needs to wait for the
> > tablesync worker to catch up.
>
> I don’t think we can have CATCHUP stage. We can have a DDL on publisher which
> can add a new column (And this DDL will be executed by applier later). Then we get a INSERT
> because we have old definition of table, insert will fail.

All DMLs and DDLs associated with the table being synchronized are
applied by the tablesync worker until it catches up with the apply
worker.

>
> >
> > We need to repeat these steps until we complete the initial data copy and create
> > indexes for all tables, IOW until all pg_subscription_rel status becomes READY.
> >
> > 6. If the apply worker confirms all tables are READY, it starts another sync
> > worker who is responsible for the post-data database objects such as TRIGGER,
> > RULE, POLICY etc (i.e. post-data).
> >
> > While the sync worker is starting up or working, the apply worker applies
> > changes for pre-data database objects as well as READY tables.
> We might have some issue if we have create table like
> Create table_name as select * from materialized_view.

Could you elaborate on the scenario where we could have an issue with such DDL?

> >
> > 7. Similar to the tablesync worker, this sync worker creates its replication slot
> > and sets the returned LSN somewhere, say pg_subscription.
> >
> > 8. The sync worker dumps and restores these objects. Which could take a time
> > since it would need to create FK constraints. Then it starts to catch up if the
> > apply worker is ahead. The apply worker waits for the sync worker to catch up.
> >
> > 9. Once the sync worker catches up, the apply worker starts applying changes
> > for all database objects.
> >
> > IIUC with this approach, we can resolve the concurrent DDL problem Sachin
> > mentioned, and indexes (and constraints) are created after the initial data copy.
> >
> > The procedures are still very complex and not fully considered yet but I hope
> > there are some useful things at least for discussion.
> >
> > Probably we can start with supporting only tables. In this case, we would not
> > need the post-data phase (i.e. step 6-9). It seems to me that we need to have
> > the subscription state somewhere (maybe
> > pg_subscription) so that the apply worker figure out the next step.
> > Since we need to dump and restore different objects on different timings, we
> > probably cannot directly use pg_dump/pg_restore. I've not considered how the
> > concurrent REFRESH PUBLICATION works.
>
> I think above prototype will work and will have least amount of side effects, but
> It might be too complex to implement and I am not sure about corner cases.
>
> I was thinking of other ways of doing Initial Sync , which are less complex but each
> with separate set of bottlenecks
>
> On Publisher Side:-
> 1) Locking the publisher:- Easiest one to implement, applier process will get Access Shared
> lock on the all the published tables. (We don't have to worry newly created concurrent table)
> As tableSync will finish syncing the table, it will release table lock, So we will release
> table locks in steps. Users can still perform DML on tables, but DDLs wont be allowed.

Do you mean that the apply worker acquires table locks and the
tablesync workers release them? If so, how can we implement it?

>
> 2) Using pg_dump/pg_restore for schema and data:- As Amit mentioned we can use pg_dump/
> pg_restore [1], Although it might have side effect of using double storage , we can
> table pg_dump of each table separately and delete the dump as soon as table is synced.
> tableSync process will read the dump and call pg_restore on the table.
> If we crash in middle of restoring the tables we can start pg_dump(--clean)/restore again
> with left out tables.
> With this we can reduce space usage but we might create too many files.

With this idea, who does pg_dump and pg_restore? and when do we create
pg_subscription_rel entries?

>
> 3) Using publisher snapshot:- Applier process will do pg_dump/pg_restore as usual,
> Then applier process will start a new process P1 which will connect to
> publisher and start a transaction , it will call pg_export_snapshot() to export the
> snapshot.Then applier process will take snapshot string and pass it to the tableSync process
> as a argument. tableSync will use this snapshot for COPY TABLE. tableSync should only
> do COPY TABLE and then will exit , So we wont do any catchup phase in tableSync. After
> all tables finish COPY table transaction will be committed by P1 process and it will exit.
> In the case of crash/restart we can simple start from beginning since nothing is committed
> till every table is synced. There are 2 main issues with this approach
> 1. I am not sure what side-effects we might have on publisher since we might have to keep
> the transaction open for long time.

I'm concerned that it would not be an acceptable downside that we keep
a transaction open until all tables are synchronized.

> 2. Applier process will simple wait till all tables are synced.
> since applier process wont be able to apply any wal_logs till all tables are synced
> maybe instead of creating new process Applier process itself can start transaction/
> export snapshot and tableSync process will use that snapshot. After all tables are synced
> it can start wal_streaming.

I think that after users execute REFRESH PUBLICATION, there are mixed
non-ready and ready tables in the subscription. In this case, it's a
huge restriction for users that logical replication for the ready
tables stops until all newly-subscribed tables are synchronized.

Regards,

[1] https://www.postgresql.org/message-id/CAGPVpCRWEVhXa7ovrhuSQofx4to7o22oU9iKtrOgAOtz_%3DY6vg%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

RE: Initial Schema Sync for Logical Replication at 2023-04-03 06:53:56 from Kumar, Sachin

Responses

RE: Initial Schema Sync for Logical Replication at 2023-04-05 13:25:12 from Kumar, Sachin

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Drouvot, Bertrand	2023-04-03 15:20:55	Re: Minimal logical decoding on standbys
Previous Message	Melanie Plageman	2023-04-03 14:49:43	Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode