Quick Links

RE: Add support for specifying tables in pg_createsubscriber.

From:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Euler Taveira <euler(at)eulerto(dot)com>, 'Shubham Khanna' <khannashubham1197(at)gmail(dot)com>
Cc:	vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject:	RE: Add support for specifying tables in pg_createsubscriber.
Date:	2025-08-22 09:57:00
Message-ID:	TY4PR01MB1690743196F9FC7D38DD59389943DA@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Friday, August 22, 2025 11:19 AM Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> On Thu, Aug 21, 2025, at 6:08 AM, Shubham Khanna wrote:
> > Attachments:
> > * v3-0001-Support-tables-via-pg_createsubscriber.patch
> > * v3-0002-Support-WHERE-clause-and-COLUMN-list-in-table-arg.patch
>
> + <term><option>--table=<replaceable
> class="parameter">table</replaceable></option></term>
> + <listitem>
> + <para>
> + Adds one or more specific tables to the publication for the most
> recently
> + specified <option>--database</option>. This option can be given
> multiple
> + times to include additional tables.
> + </para>
>
> I think it is a really bad idea to rely on the order of options to infer which tables
> are from which database. The current design about publication and
> subscription names imply order but it doesn't mean subscription name must
> be the next option after the publication name.

The documentation appears incorrect and needs revision. The latest version no
longer depends on the option order; instead, it requires users to provide
database-qualified table names, such as -t "db1.sch1.tb1". This adjustment
allows the command to internally categorize tables by their target database.

>
> Let's recap the initial goal: pg_createsubscriber creates a new logical replica
> from a physical standby server. Your proposal is extending the tool to create a
> partial logical replica but doesn't mention what you would do with the other
> part; that is garbage after the conversion. I'm not convinced that the current
> proposal is solid as-is.

I think we can explore extending the existing --clean option in a separate patch
to support table cleanup. This option is implemented in a way that allows adding
further cleanup objects later, so it should be easy to extend it for table.
Prior to this extension, it should be noted in the documentation that users are
required to clean up the tables themselves.

>
> + <para>
> + The argument must be a fully qualified table name in one of the
> + following forms:
> +
> <itemizedlist><listitem><para><literal>schema.table</literal></para></li
> stitem>
> +
> <listitem><para><literal>db.schema.table</literal></para></listitem></it
> emizedlist>
> + If the database name is provided, it must match the most recent
> + <option>--database</option> argument.
> + </para>
>
> Why do you want to include the database if you already specified it?

As mentioned earlier, this document needs to be corrected.

>
> + <para>
> + A table specification may also include an optional column list and/or
> + row filter:
> + <itemizedlist>
> + <listitem>
> + <para>
> + <literal>schema.table(col1, col2, ...)</literal> — publishes
> + only the specified columns.
> + </para>
> + </listitem>
> + <listitem>
> + <para>
> + <literal>schema.table WHERE (predicate)</literal> —
> publishes
> + only rows that satisfy the given condition.
> + </para>
> + </listitem>
> + <listitem>
> + <para>
> + Both forms can be combined, e.g.
> + <literal>schema.table(col1, col2) WHERE (id > 100)</literal>.
> + </para>
> + </listitem>
> + </itemizedlist>
> + </para>
>
> This is a bad idea for some cases. Let's say your setup involves a filter that uses
> only 10% of the rows from a certain table. It is better to do a manual setup.
> Besides that, expressions in options can open a can of worms. In the column
> list case, there might be corner cases like if you have a constraint in a certain
> column and that column was not included in the column list, the setup will fail;
> there isn't a cheap way to detect such cases.

I agree that supporting row filter and column list is not straightforward, and
we can consider it separately and do not implement that in the first version.

>
> It seems this proposal doesn't serve a general purpose. It is copying a *whole*
> cluster to use only a subset of tables. Your task with pg_createsubscriber is
> more expensive than doing a manual logical replication setup. If you have 500
> tables and want to replicate only 400 tables, it doesn't seem productive to
> specify 400 -t options.

Specifying multiple -t options should not be problematic, as users has already
done similar things for "FOR TABLE" publication DDLs. I think it's not hard
for user to convert FOR TABLE list to -t option list.

> There are some cases like a small set of big tables that
> this feature makes sense. However, I'm wondering if a post script should be
> used to adjust your setup.

I think it's not very convenient for users to perform this conversion manually.
I've learned in PGConf.dev this year that some users avoid using
pg_createsubscriber because they are unsure of the standard steps required to
convert it into subset table replication. Automating this process would be
beneficial, enabling more users to use pg_createsubscriber and take advantage of
the rapid initial table synchronization.

> There might be cases that involves only dozens of
> tables but my experience says it is rare. My expectation is that this feature is
> useful for avoiding some specific tables. Hence, the copy of the whole cluster is
> worthwhile.

We could consider adding an --exclude-table option later if needed. However, as
mentioned earlier, I think specifying multiple -t options can address this use
case as well.

Best Regards,
Hou zj

In response to

Re: Add support for specifying tables in pg_createsubscriber. at 2025-08-22 03:18:37 from Euler Taveira

Responses

Re: Add support for specifying tables in pg_createsubscriber. at 2025-08-22 15:26:29 from Euler Taveira

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Álvaro Herrera	2025-08-22 09:59:31	Re: Remove unneeded cast in heap_xlog_lock.
Previous Message	Álvaro Herrera	2025-08-22 09:46:45	Re: misleading error message in ProcessUtilitySlow T_CreateStatsStmt