Re: Initial Schema Sync for Logical Replication

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>, "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Initial Schema Sync for Logical Replication
Date: 2023-03-21 02:01:32
Message-ID: 22c275cc-a2d5-4cee-ac81-bbb6950248b8@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 20, 2023, at 10:10 PM, Kumar, Sachin wrote:
> > From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
> > Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
> > On 2023-Mar-15, Kumar, Sachin wrote:
> >
> > > 1. In CreateSubscription() when we create replication
> > > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that we
> > can use this snapshot later in the pg_dump.
> > >
> > > 2. Now we can call pg_dump with above snapshot from CreateSubscription.
> >
> > Overall I'm not on board with the idea that logical replication would depend on
> > pg_dump; that seems like it could run into all sorts of trouble (what if calling
> > external binaries requires additional security setup? what about pg_hba
> > connection requirements? what about max_connections in tight
> > circumstances?).
> > what if calling external binaries requires additional security setup
> I am not sure what kind of security restriction would apply in this case, maybe pg_dump
> binary can be changed ?
Using pg_dump as part of this implementation is not acceptable because we
expect the backend to be decoupled from the client. Besides that, pg_dump
provides all table dependencies (such as tablespaces, privileges, security
labels, comments); not all dependencies shouldn't be replicated. You should
exclude them removing these objects from the TOC before running pg_restore or
adding a few pg_dump options to exclude these objects. Another issue is related
to different version. Let's say the publisher has a version ahead of the
subscriber version, a new table syntax can easily break your logical
replication setup. IMO pg_dump doesn't seem like a good solution for initial
synchronization.

Instead, the backend should provide infrastructure to obtain the required DDL
commands for the specific (set of) tables. This can work around the issues from
the previous paragraph:

* you can selectively choose dependencies;
* don't require additional client packages;
* don't need to worry about different versions.

This infrastructure can also be useful for other use cases such as:

* client tools that provide create commands (such as psql, pgAdmin);
* other logical replication solutions;
* other logical backup solutions.

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-03-21 02:25:04 Re: Assertion failure with barriers in parallel hash join
Previous Message Peter Smith 2023-03-21 01:32:40 Re: Allow logical replication to copy tables in binary format