Re: Initial Schema Sync for Logical Replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Kumar, Sachin" <ssetiya(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Initial Schema Sync for Logical Replication
Date: 2023-04-28 07:16:14
Message-ID: CAD21AoC0qaBgUxtB-1KqvXO2hmKvPijsy6yE05_zNrb4MW+yUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 27, 2023 at 12:02 PM Wei Wang (Fujitsu)
<wangw(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Fri, Apr 21, 2023 at 16:48 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > On Thu, Apr 20, 2023 at 8:16 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Apr 17, 2023 at 9:12 AM Masahiko Sawada
> > <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, Apr 7, 2023 at 6:37 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Thu, Apr 6, 2023 at 6:57 PM Masahiko Sawada
> > <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > >
> > > > > >
> > > > > > While writing a PoC patch, I found some difficulties in this idea.
> > > > > > First, I tried to add schemaname+relname to pg_subscription_rel but I
> > > > > > could not define the primary key of pg_subscription_rel. The primary
> > > > > > key on (srsubid, srrelid) doesn't work since srrelid could be NULL.
> > > > > > Similarly, the primary key on (srsubid, srrelid, schemaname, relname)
> > > > > > also doesn't work.
> > > > > >
> > > > >
> > > > > Can we think of having a separate catalog table say
> > > > > pg_subscription_remote_rel for this? You can have srsubid,
> > > > > remote_schema_name, remote_rel_name, etc. We may need some other
> > state
> > > > > to be maintained during the initial schema sync where this table can
> > > > > be used. Basically, this can be used to maintain the state till the
> > > > > initial schema sync is complete because we can create a relation entry
> > > > > in pg_subscritption_rel only after the initial schema sync is
> > > > > complete.
> > > >
> > > > It might not be ideal but I guess it works. But I think we need to
> > > > modify the name of replication slot for initial sync as it currently
> > > > includes OID of the table:
> > > >
> > > > void
> > > > ReplicationSlotNameForTablesync(Oid suboid, Oid relid,
> > > > char *syncslotname, Size szslot)
> > > > {
> > > > snprintf(syncslotname, szslot, "pg_%u_sync_%u_" UINT64_FORMAT,
> > suboid,
> > > > relid, GetSystemIdentifier());
> > > > }
> > > >
> > > > If we use both schema name and table name, it's possible that slot
> > > > names are duplicated if schema and/or table names are long. Another
> > > > idea is to use the hash value of schema+table names, but it cannot
> > > > completely eliminate that possibility, and probably would make
> > > > investigation and debugging hard in case of any failure. Probably we
> > > > can use the OID of each entry in pg_subscription_remote_rel instead,
> > > > but I'm not sure it's a good idea, mainly because we will end up using
> > > > twice as many OIDs as before.
> > > >
> > >
> > > The other possibility is to use worker_pid. To make debugging easier,
> > > we may want to LOG schema_name+rel_name vs slot_name mapping at
> > DEBUG1
> > > log level.
> >
> > Since worker_pid changes after the worker restarts, a new worker
> > cannot find the slot that had been used, no?
> >
> > After thinking it over, a better solution would be that we add an oid
> > column, nspname column, and relname column to pg_subscription_rel and
> > the primary key on the oid. If the table is not present on the
> > subscriber we store the schema name and table name to the catalog, and
> > otherwise we store the local table oid same as today. The local table
> > oid will be filled after the schema sync. The names of origin and
> > replication slot the tablesync worker uses use the oid instead of the
> > table oid.
> >
> > I've attached a PoC patch of this idea (very rough patch and has many
> > TODO comments). It mixes the following changes:
> >
> > 1. Add oid column to the pg_subscription_rel. The oid is used as the
> > primary key and in the names of origin and slot the tablesync workers
> > use.
> >
> > 2. Add copy_schema = on/off option to CREATE SUBSCRIPTION (not yet
> > support for ALTER SUBSCRIPTION).
> >
> > 3. Add CRS_EXPORT_USE_SNAPSHOT new action in order to use the same
> > snapshot by both walsender and other processes (e.g. pg_dump). In this
> > patch, the snapshot is exported for pg_dump and is used by the
> > walsender for COPY.
> >
> > It seems to work well but there might be a pitfall as I've not fully
> > implemented it.
>
> Thanks for your POC patch.
> After reviewing this patch, I have a question below that want to confirm:
>
> 1. In the function synchronize_table_schema.
> I think some changes to GUC and table-related object SQLs are included in the
> pg_dump result. And in this POC, these SQLs will be executed. Do we need to
> alter the pg_dump results to only execute the table schema related SQLs?

Yes, in this approach, we need to dump/restore objects while
specifying with fine granularity. Ideally, the table sync worker dumps
and restores the table schema, does copy the initial data, and then
creates indexes, and triggers and table-related objects are created
after that. So if we go with the pg_dump approach to copy the schema
of individual tables, we need to change pg_dump (or libpgdump needs to
be able to do) to support it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Denis Laxalde 2023-04-28 08:08:15 Re: Add PQsendSyncMessage() to libpq
Previous Message Michael Paquier 2023-04-28 07:10:05 Re: Add PQsendSyncMessage() to libpq