Re: Initial Schema Sync for Logical Replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Initial Schema Sync for Logical Replication
Date: 2023-04-21 08:47:31
Message-ID: CAD21AoCdfg506__qKz+HX8vqfdyKgQ5qeehgqq9bi1L-6p5Pwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 20, 2023 at 8:16 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Apr 17, 2023 at 9:12 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Apr 7, 2023 at 6:37 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Apr 6, 2023 at 6:57 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > >
> > > > While writing a PoC patch, I found some difficulties in this idea.
> > > > First, I tried to add schemaname+relname to pg_subscription_rel but I
> > > > could not define the primary key of pg_subscription_rel. The primary
> > > > key on (srsubid, srrelid) doesn't work since srrelid could be NULL.
> > > > Similarly, the primary key on (srsubid, srrelid, schemaname, relname)
> > > > also doesn't work.
> > > >
> > >
> > > Can we think of having a separate catalog table say
> > > pg_subscription_remote_rel for this? You can have srsubid,
> > > remote_schema_name, remote_rel_name, etc. We may need some other state
> > > to be maintained during the initial schema sync where this table can
> > > be used. Basically, this can be used to maintain the state till the
> > > initial schema sync is complete because we can create a relation entry
> > > in pg_subscritption_rel only after the initial schema sync is
> > > complete.
> >
> > It might not be ideal but I guess it works. But I think we need to
> > modify the name of replication slot for initial sync as it currently
> > includes OID of the table:
> >
> > void
> > ReplicationSlotNameForTablesync(Oid suboid, Oid relid,
> > char *syncslotname, Size szslot)
> > {
> > snprintf(syncslotname, szslot, "pg_%u_sync_%u_" UINT64_FORMAT, suboid,
> > relid, GetSystemIdentifier());
> > }
> >
> > If we use both schema name and table name, it's possible that slot
> > names are duplicated if schema and/or table names are long. Another
> > idea is to use the hash value of schema+table names, but it cannot
> > completely eliminate that possibility, and probably would make
> > investigation and debugging hard in case of any failure. Probably we
> > can use the OID of each entry in pg_subscription_remote_rel instead,
> > but I'm not sure it's a good idea, mainly because we will end up using
> > twice as many OIDs as before.
> >
>
> The other possibility is to use worker_pid. To make debugging easier,
> we may want to LOG schema_name+rel_name vs slot_name mapping at DEBUG1
> log level.

Since worker_pid changes after the worker restarts, a new worker
cannot find the slot that had been used, no?

After thinking it over, a better solution would be that we add an oid
column, nspname column, and relname column to pg_subscription_rel and
the primary key on the oid. If the table is not present on the
subscriber we store the schema name and table name to the catalog, and
otherwise we store the local table oid same as today. The local table
oid will be filled after the schema sync. The names of origin and
replication slot the tablesync worker uses use the oid instead of the
table oid.

I've attached a PoC patch of this idea (very rough patch and has many
TODO comments). It mixes the following changes:

1. Add oid column to the pg_subscription_rel. The oid is used as the
primary key and in the names of origin and slot the tablesync workers
use.

2. Add copy_schema = on/off option to CREATE SUBSCRIPTION (not yet
support for ALTER SUBSCRIPTION).

3. Add CRS_EXPORT_USE_SNAPSHOT new action in order to use the same
snapshot by both walsender and other processes (e.g. pg_dump). In this
patch, the snapshot is exported for pg_dump and is used by the
walsender for COPY.

It seems to work well but there might be a pitfall as I've not fully
implemented it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
0001-Poc-initial-table-structure-synchronization-in-logic.patch application/octet-stream 55.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-04-21 08:49:48 Re: [PATCH] Extend the length of BackgroundWorker.bgw_library_name
Previous Message Masahiko Sawada 2023-04-21 08:47:23 Re: Initial Schema Sync for Logical Replication