Re: Add support for specifying tables in pg_createsubscriber.

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "'Shubham Khanna'" <khannashubham1197(at)gmail(dot)com>
Cc: "vignesh C" <vignesh21(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Peter Smith" <smithpb2250(at)gmail(dot)com>
Subject: Re: Add support for specifying tables in pg_createsubscriber.
Date: 2025-08-22 15:26:29
Message-ID: 30cc34eb-07a0-4b55-b4fe-6c526886b2c4@app.fastmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 22, 2025, at 6:57 AM, Zhijie Hou (Fujitsu) wrote:
> The documentation appears incorrect and needs revision. The latest version no
> longer depends on the option order; instead, it requires users to provide
> database-qualified table names, such as -t "db1.sch1.tb1". This adjustment
> allows the command to internally categorize tables by their target database.
>

I don't like this design. There is no tool that uses 3 elements. It is also
confusing and redundant to have the database in the --database option and also
in the --table option.

I'm wondering if we allow using a specified publication is a better UI. If you
specify --publication and it exists on primary, use it. The current behavior is
a failure if the publication exists. It changes the current behavior but I
don't expect someone relying on this failure to abort the execution. Moreover,
the error message was added to allow only FOR ALL TABLES; the proposal is to
relax this restriction.

> I think we can explore extending the existing --clean option in a separate patch
> to support table cleanup. This option is implemented in a way that allows adding
> further cleanup objects later, so it should be easy to extend it for table.
> Prior to this extension, it should be noted in the documentation that users are
> required to clean up the tables themselves.
>

I would say that these cleanup feature (starting with the cleanup databases) is
equally important as the feature that selects specific objects.

> I agree that supporting row filter and column list is not straightforward, and
> we can consider it separately and do not implement that in the first version.
>

The proposal above would allow it with no additional lines of code.

>>
>> It seems this proposal doesn't serve a general purpose. It is copying a *whole*
>> cluster to use only a subset of tables. Your task with pg_createsubscriber is
>> more expensive than doing a manual logical replication setup. If you have 500
>> tables and want to replicate only 400 tables, it doesn't seem productive to
>> specify 400 -t options.
>
> Specifying multiple -t options should not be problematic, as users has already
> done similar things for "FOR TABLE" publication DDLs. I think it's not hard
> for user to convert FOR TABLE list to -t option list.
>

Of course it is. Shell limits the number of arguments.

>> There are some cases like a small set of big tables that
>> this feature makes sense. However, I'm wondering if a post script should be
>> used to adjust your setup.
>
> I think it's not very convenient for users to perform this conversion manually.
> I've learned in PGConf.dev this year that some users avoid using
> pg_createsubscriber because they are unsure of the standard steps required to
> convert it into subset table replication. Automating this process would be
> beneficial, enabling more users to use pg_createsubscriber and take advantage of
> the rapid initial table synchronization.
>

You missed my point. I'm not talking about manually converting a physical
replica into a logical replica. I'm talking about the plain logical replication
setup (CREATE PUBLICATION, CREATE SUBSCRIPTION). IME this tool is beneficial
for large clusters that we want to replicate (almost) all tables.

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergey Soloviev 2025-08-22 15:27:16 [BUG] Remove self joins causes 'variable not found in subplan target lists' error
Previous Message Tom Lane 2025-08-22 15:03:44 Re: Identifying function-lookup failures due to argument name mismatches