RE: Initial Schema Sync for Logical Replication

From: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Initial Schema Sync for Logical Replication
Date: 2023-03-21 01:10:06
Message-ID: e378fb636a694c81b354d3c405f0179d@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Alvaro,

> From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
> Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
> On 2023-Mar-15, Kumar, Sachin wrote:
>
> > 1. In CreateSubscription() when we create replication
> > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that we
> can use this snapshot later in the pg_dump.
> >
> > 2. Now we can call pg_dump with above snapshot from CreateSubscription.
>
> Overall I'm not on board with the idea that logical replication would depend on
> pg_dump; that seems like it could run into all sorts of trouble (what if calling
> external binaries requires additional security setup? what about pg_hba
> connection requirements? what about max_connections in tight
> circumstances?).
> what if calling external binaries requires additional security setup
I am not sure what kind of security restriction would apply in this case, maybe pg_dump
binary can be changed ?
> what about pg_hba connection requirements?
We will use the same connection string which subscriber process uses to connect to
the publisher.
>what about max_connections in tight circumstances?
Right that might be a issue, but I don’t think it will be a big issue, We will create dump
of database in CreateSubscription() function itself , So before tableSync process even starts
if we have reached max_connections while calling pg_dump itself , tableSync wont be successful.
> It would be much better, I think, to handle this internally in the publisher instead:
> similar to how DDL sync would work, except it'd somehow generate the CREATE
> statements from the existing tables instead of waiting for DDL events to occur. I
> grant that this does require writing a bunch of new code for each object type, a
> lot of which would duplicate the pg_dump logic, but it would probably be a lot
> more robust.
Agree , But we might have a lots of code duplication essentially almost all of pg_dump
Code needs to be duplicated, which might cause issue when modifying/adding new
DDLs.
I am not sure but if it's possible to move dependent code of pg_dump to common/ folder
, to avoid duplication.

Regards
Sachin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-03-21 01:15:39 Re: Fix typo plgsql to plpgsql.
Previous Message Michael Paquier 2023-03-21 01:04:46 Re: [BUG] pg_stat_statements and extended query protocol