Quick Links

Re: Add support for specifying tables in pg_createsubscriber.

From:	Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Add support for specifying tables in pg_createsubscriber.
Date:	2025-08-21 09:08:33
Message-ID:	CAHv8RjL+cSsXeGfsBSEu6eCDuR_cqs3qeMXWTu22pK0B-5De_Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Aug 15, 2025 at 12:46 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Shubham,
>
> Some review comments for patch v2-0001
>
> ======
> 1. General - compile errors!
>
> Patch applies OK, but I cannot build pg_createsubscriber. e.g.
>
> pg_createsubscriber.c: In function ‘main’:
> pg_createsubscriber.c:2237:5: error: a label can only be part of a
> statement and a declaration is not a statement
> TableListPerDB * newdb;
> ^
> pg_createsubscriber.c:2324:5: error: a label can only be part of a
> statement and a declaration is not a statement
> TableSpec * ts = pg_malloc0(sizeof(TableSpec));
> ^
> pg_createsubscriber.c:2325:5: error: expected expression before ‘PQExpBuffer’
> PQExpBuffer dbbuf;
> ^
> pg_createsubscriber.c:2326:5: warning: ISO C90 forbids mixed
> declarations and code [-Wdeclaration-after-statement]
> PQExpBuffer schemabuf;
> ^
> pg_createsubscriber.c:2335:5: error: ‘dbbuf’ undeclared (first use in
> this function)
> dbbuf = createPQExpBuffer();
>
> I'm not sure how this can be working for you (???). You cannot declare
> variables without introducing a code block in the scope of the switch
> case.
>
Fixed.

> ======
> doc/src/sgml/ref/pg_createsubscriber.sgml
>
> 2.
> --table is missing from synopsis
>
> ~~~

Fixed.

>
> 3.
> + <term><option>--table=<replaceable
> class="parameter">table</replaceable></option></term>
> + <listitem>
> + <para>
> + Adds a table to be included in the publication for the most recently
> + specified database. Can be repeated multiple times. The syntax
> + supports optional column lists and WHERE clauses.
> + </para>
> + </listitem>
> + </varlistentry>
>
> Lacks detail, so I can't tell how to use this. e.g:
> - What does the syntax actually look like?
> - code suggests you can specify DB, but this doc says it only applies
> to the most recent DB
> - how supports column lists and WHERE clause (needs examples)
> - needs rules FOR ALL TABLES, etc.
> - allowed in combination with --all?
> - etc.
>

Fixed.

> ======
> src/bin/pg_basebackup/pg_createsubscriber.c
>
> 4.
> +typedef struct TableListPerDB
> +{
> + char *dbname;
> + TableSpec *tables;
> + struct TableListPerDB *next;
> +} TableListPerDB;
>
> I didn't understand the need for this "per-DB" structure.
>
> Later, you declare "TableSpec *tables;" within the "LogicalRepInfo"
> structure (which is per-DB) so you already have the "per-DB" table
> list right there. Even if you need to maintain some global static
> list, then I imagine you could just put a 'dbname' member in
> TableSpec. You don't need a whole new structure to do it.
>
> ~~~
>

Removed the structure TableListPerDB as suggested.

> create_publication:
>
> 5.
> + if (dbinfo->tables == NULL)
> + appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR ALL TABLES", ipubname_esc);
> + else
> + {
> + bool first = true;
> +
> + appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR TABLE ", ipubname_esc);
> + for (TableSpec * tbl = dbinfo->tables; tbl != NULL; tbl = tbl->next)
>
> What if '*' is specified for the table name? Should that cause a
> "CREATE PUBLICATION ... FOR TABLES IN SCHEMA ..." instead of making a
> publication with 100s or more tables in a FOR TABLES?
>
> ~~~

When the user specifies '*' for the table name in pg_createsubscriber,
it makes sense to generate a SQL publication command using FOR TABLES
IN SCHEMA instead of enumerating all tables explicitly. This approach
is more efficient and scalable, especially when dealing with schemas
containing hundreds or more tables, and it ensures future tables added
to the schema are automatically included without needing to recreate
the publication.
Currently, the provided patches do not implement this behavior and
always enumerate tables in the FOR TABLE clause regardless of
wildcards. I acknowledge this and plan to address the handling of the
'*' wildcard and generate FOR TABLES IN SCHEMA publications in a
future update.

>
> 6.
> + for (int i = 0; i < PQntuples(tres); i++)
> + {
> + char *escaped_schema = PQescapeIdentifier(conn, PQgetvalue(tres, i, 0),
> + strlen(PQgetvalue(tres, i, 0)));
> + char *escaped_table = PQescapeIdentifier(conn, PQgetvalue(tres, i, 1),
> + strlen(PQgetvalue(tres, i, 1)));
> +
> + appendPQExpBuffer(str, "%s%s.%s", first ? "" : ", ",
> + escaped_schema, escaped_table);
> +
> + PQfreemem(escaped_schema);
> + PQfreemem(escaped_table);
> + first = false;
> + }
>
> 6a.
> How about some other simple variables to avoid all the repeated PQgetvalue?
>
> e.g.
> char *sch = PQgetvalue(tres, i, 0);
> char *tbl = PQgetvalue(tres, i, 1);
> char *escaped_schema = PQescapeIdentifier(conn, sch, strlen(sch));
> char *escaped_table = PQescapeIdentifier(conn, tbl, strlen(tbl));
>
> ~
>

Fixed.

> 6b.
> Variable 'first' is redundant. Same as checking 'i == 0'.
>
> ~~~
>

Fixed.

> 7.
> + if (dry_run)
> + {
> + res = PQexec(conn, "BEGIN");
> + if (PQresultStatus(res) != PGRES_COMMAND_OK)
> + {
>
> Would it be better to use if/else instead of:
>
> if (dry_run)
> if (!dry_run)
>
> ~~~
>

Fixed.

> main:
>
> 8.
> + TableListPerDB * newdb;
> if (!simple_string_list_member(&opt.database_names, optarg))
> {
> simple_string_list_append(&opt.database_names, optarg);
>
>
> 8a.
> Compile error. Need {} scope to declare that variable!
>
> ~
>

Fixed.

> 8b.
> This seems a strange muddle of 'd' which represents DATABASE and
> TableListPerDB, which is a list of tables (not a database). I have
> doubts that most of this linked list code is even necessary for the
> 'd' case.
>
> ~~~

Fixed.

>
> 9.
> + case 6:
> + TableSpec * ts = pg_malloc0(sizeof(TableSpec));
> + PQExpBuffer dbbuf;
>
> Compile error. Need {} scope to declare that variable!
>
> ~~~
>

Fixed.

> 10.
> + if (dotcnt == 2)
> + {
> + ts->pattern_db_regex = NULL;
> + ts->pattern_schema_regex = pg_strdup(schemabuf->data);
> + ts->pattern_table_regex = pg_strdup(namebuf->data);
> + }
> + else if (dotcnt == 1)
> + {
> + ts->pattern_db_regex = NULL;
> + ts->pattern_schema_regex = pg_strdup(dbbuf->data);
> + ts->pattern_table_regex = pg_strdup(schemabuf->data);
> + }
> + else
> + pg_fatal("invalid --table specification: %s", optarg);
>
> 10a.
> This code seems quite odd to me:
> - The DB becomes the schema?
> - The schema becomes the table?
>
> Rather than fudging all the names of the --table parts, if you don't
> know what they represent up-fron,t probably it is better to call them
> just part1,part2,part3.
>
> ~~~

Fixed.

>
> 10b.
> Why is pattern_db_regex always NULL? If it is always NULL why have it at all?
>

Removed.

>
> ======
> .../t/040_pg_createsubscriber.pl
>
> 11.
> +# Test: Table-level publication creation
> +$node_p->safe_psql($db3, "CREATE TABLE public.t1 (id int, val text)");
> +$node_p->safe_psql($db3, "CREATE TABLE public.t2 (id int, val text)");
> +$node_p->safe_psql($db4,
> + "CREATE TABLE public.t3 (id int, val text, extra int)");
> +
>
> IIUC, the schema name is part of the table syntax. So, you should
> include test cases for different schemas.
>
> ~~~

Fixed.

>
> 12.
> +# Run pg_createsubscriber with table-level options
> +command_ok(
> + [
> + 'pg_createsubscriber',
> + '--verbose',
> + '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
> + '--pgdata' => $node_s2->data_dir,
> + '--publisher-server' => $node_p->connstr($db3),
> + '--socketdir' => $node_s2->host,
> + '--subscriber-port' => $node_s2->port,
> + '--database' => $db3,
> + '--table' => "$db3.public.t1",
> + '--table' => "$db3.public.t2",
> + '--database' => $db4,
> + '--table' => "$db4.public.t3",
> + ],
> + 'pg_createsubscriber runs with table-level publication (existing nodes)');
>
>
> 12a.
> This is not really testing the same as what the commit message
> describes. e.g. what about a test case where --table does not mention
> the db explicitly, so relies on the most recent.
>
> ~
>
> 12b.
> What should happen if the explicitly named DB in --table is not the
> same as the most recent --database, even though it is otherwise
> correct?
>
> e.g.
> '--database' => $db3,
> '--table' => "$db3.public.t1",
> '--database' => $db4,
> '--table' => "$db3.public.t2",
> '--table' => "$db4.public.t3",
> I quickly tried it and AFAICT this was silently accepted and then the
> test failed because it gave unexpected results. It doesn't seem good
> behaviour.
>
> ~
>
> 12c.
>
> (related to 12b/12c).
>
> I became suspicious that the DB name part of the --table option is
> completely bogus. And it is. The test still passes OK even after I
> write junk in the --table database part, like below.
>
> command_ok(
> [
> 'pg_createsubscriber',
> '--verbose',
> '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
> '--pgdata' => $node_s2->data_dir,
> '--publisher-server' => $node_p->connstr($db3),
> '--socketdir' => $node_s2->host,
> '--subscriber-port' => $node_s2->port,
> '--database' => $db3,
> '--table' => "$db3.public.t1",
> '--table' => "REALLY???.public.t2",
> '--database' => $db4,
> '--table' => "$db4.public.t3",
> ],
> 'pg_createsubscriber runs with table-level publication (existing nodes)');
> ~~~
>

v3-0002 patch handles the comment 12.

> 13.
> +# Get the publication name created by pg_createsubscriber for db3
> +my $pubname1 = $node_p->safe_psql(
> + $db3, qq(
> + SELECT pubname FROM pg_publication
> + WHERE pubname LIKE 'pg_createsubscriber_%'
> + ORDER BY pubname LIMIT 1
> +));
> +
>
> Why don't you just name the publication explicitly in the command?
> Then you don't need any of this code to discover the publication name
> here.
>
> ~~~

Fixed.

>
> 14.
> +# Check publication tables for db3
> +my $actual1 = $node_p->safe_psql(
> + $db3, qq(
> + SELECT pubname || '|public|' || tablename
> + FROM pg_publication_tables
> + WHERE pubname = '$pubname1'
> + ORDER BY tablename
> +));
> +is($actual1, "$pubname1|public|t1\n$pubname1|public|t2",
> + 'single publication for both tables created successfully on database db3'
> +);
>
> What is the point of hardwiring the 'public' in the concatenated
> string, and then verifying that it is still there in the result? Why
> not hardwire 'banana' instead of 'public' -- it passes the test just
> the same.
>
> ~~~
>

Fixed.

> 15.
> +# Get the publication name created by pg_createsubscriber for db4
> +my $pubname2 = $node_p->safe_psql(
> + $db4, qq(
> + SELECT pubname FROM pg_publication
> + WHERE pubname LIKE 'pg_createsubscriber_%'
> + ORDER BY pubname LIMIT 1
> +));
> +
>
> (same as #13 before)
>
> Why don't you simply name the publication explicitly in the command?
> Then you don't need any of this code to discover the publication name
> here.
>
> ~~~
>

Fixed.

> 16.
> +# Check publication tables for db4
> +my $actual2 = $node_p->safe_psql(
> + $db4, qq(
> + SELECT pubname || '|public|' || tablename
> + FROM pg_publication_tables
> + WHERE pubname = '$pubname2'
> + ORDER BY tablename
> +));
> +is($actual2, "$pubname2|public|t3",
> + 'single publication for t3 created successfully on database db4');
> +
>
> (same as #14 before)
>
> What is the point of the hardwired 'public'?
>

Fixed.

The attached patches contain the suggested changes. They also address
the comments given by Peter at [1].
[1] - https://www.postgresql.org/message-id/CAHut%2BPuG8Vd%3DMNbQyN-3D1nsfEatmcd5bG6%2BL-GnOyoR9tC_6w%40mail.gmail.com

Thanks and regards,
Shubham Khanna.

Attachment	Content-Type	Size
v3-0001-Support-tables-via-pg_createsubscriber.patch	application/octet-stream	16.4 KB
v3-0002-Support-WHERE-clause-and-COLUMN-list-in-table-arg.patch	application/octet-stream	3.4 KB

In response to

Re: Add support for specifying tables in pg_createsubscriber. at 2025-08-15 07:15:38 from Peter Smith

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2025-08-21 09:13:05	Re: doc patch: correct function name for slot synchronization.
Previous Message	Mihail Nikalayeu	2025-08-21 08:55:02	Re: Conflict detection for update_deleted in logical replication