Quick Links

Re: Add support for specifying tables in pg_createsubscriber.

From:	Peter Smith <smithpb2250(at)gmail(dot)com>
To:	Shubham Khanna <khannashubham1197(at)gmail(dot)com>
Cc:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Add support for specifying tables in pg_createsubscriber.
Date:	2025-08-22 07:06:42
Message-ID:	CAHut+Pss+yoey-bpbYV2C+F7Cw1hAsL6Ngp7k5VhbCpVOSLbCQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Shubham.

Some review comments for patch v3-0001.

======
Commit message

1.
Allows optional column lists and row filters.

Is this right? Aren't the column lists and row filters now separated
in patch 0002?

======
doc/src/sgml/ref/pg_createsubscriber.sgml

2. synopsis

pg_createsubscriber [option...] { -d | --database }dbname { -D |
--pgdata }datadir { -P | --publisher-server }connstr { --table
}table-name

2a.
The --table should follow the --database instead of being last.

2b.
The brackets {--table} don't seem useful

2c.
Since there can be multiple tables per database, I feel an ellipsis
(...) is needed too

3.
+ <varlistentry>
+ <term><option>--table=<replaceable
class="parameter">table</replaceable></option></term>
+ <listitem>

This belongs more naturally immediately following --database.

~~~

4.
+ <para>
+ The argument must be a fully qualified table name in one of the
+ following forms:
+ <itemizedlist><listitem><para><literal>schema.table</literal></para></listitem>
+ <listitem><para><literal>db.schema.table</literal></para></listitem></itemizedlist>
+ If the database name is provided, it must match the most recent
+ <option>--database</option> argument.
+ </para>

4a.
It would be nicer if the SGML were less compacted.
e.g.
<itemizedlist>
<listitem><para><literal>schema.table</literal></para></listitem>
<listitem><para><literal>db.schema.table</literal></para></listitem>
</itemizedlist>

4b.
Why do we insist that the user must explicitly give a schema name?
Can't a missing schema just imply "public" so user doesn't have to
type so much?

4c.
FUNDAMENTAL SPEC QUESTION....

I am confused why this patch allows specifying the 'db' name part of
--table when you also insist it must be identical to the most recent
--database name. With this rule, it seems unnecessary and merely means
more validation code is required.

I can understand having the 'db' name part might be useful if you
would also permit the --table to be specified anywhere on the command,
but insisting it must match the most recent --database name seems to
cancel out any reason for allowing it in the first place.

~~~

5.
+ <para>
+ A table specification may also include an optional column list and/or
+ row filter:
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>schema.table(col1, col2, ...)</literal> — publishes
+ only the specified columns.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>schema.table WHERE (predicate)</literal> — publishes
+ only rows that satisfy the given condition.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Both forms can be combined, e.g.
+ <literal>schema.table(col1, col2) WHERE (id > 100)</literal>.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+

AFAIK, support for this was separated into patch 0002. So these docs
should be in patch 0002, not here.

======
src/bin/pg_basebackup/pg_createsubscriber.c

6.
+typedef struct TableSpec
+{
+ char *spec;
+ char *dbname;
+ char *pattern_regex;
+ char *pattern_part1_regex;
+ char *pattern_part2_regex;
+ char *pattern_part3_regex;
+ struct TableSpec *next;
+} TableSpec;
+

6a.
Add a typedefs.list entry for this new typedef, and run pg_indent.

6b.
pattern_regex is unused?

6c.
pattern_part1_regex seems assigned, but it is otherwise unused (??). Needed?

6d.
I had previously given a review comment suggesting names like
part1/2/3 because previously, you were using members with different
meanings depending on the tablespec. But, now it seems all the dot
parse logic is re-written, so AFAICT you are always using part1 means
dbregex; part2 means schemaregex; part3 means tableregex ... So, the
"part" names are strange in the current impl - if you really do know
what they represent, then call them by their proper names.

~~~

7.
+static TableSpec * table_list_head = NULL;
+static TableSpec * table_list_tail = NULL;
+static char *current_dbname = NULL;

7a.
Why isn't 'current_dbname' just a local variable of main()?

7b.
I doubt the 'tail' var is needed. See a later review comment for details.

~~~

8.
-
/*
* Cleanup objects that were created by pg_createsubscriber if there is an
* error.

Whitespace change unrelated to this patch?

~~~

usage:

9.
printf(_(" --subscription=NAME subscription name\n"));
+ printf(_(" --table table to subscribe to;
can be specified multiple times\n"));
printf(_(" -V, --version output version
information, then exit\n"));

Or, should this say "table to publish". Maybe it amounts to the same
thing, but this patch modifies CREATE PUBLICATION, not CREATE
SUBSCRIPTION.

~~~
store_pub_sub_info:

10.
+ for (i = 0; i < num_dbs; i++)
+ {
+ TableSpec *prev = NULL;
+ TableSpec *cur = table_list_head;
+ TableSpec *filtered_head = NULL;
+ TableSpec *filtered_tail = NULL;
+
+ while (cur != NULL)
+ {
+ TableSpec *next = cur->next;
+
+ if (strcmp(cur->dbname, dbinfo[i].dbname) == 0)
+ {
+ if (prev)
+ prev->next = next;
+ else
+ table_list_head = next;
+
+ cur->next = NULL;
+ if (!filtered_head)
+ filtered_head = filtered_tail = cur;
+ else
+ {
+ filtered_tail->next = cur;
+ filtered_tail = cur;
+ }
+ }
+ else
+ prev = cur;
+ cur = next;
+ }
+ dbinfo[i].tables = filtered_head;
+ }
+

10a.
Is there a bug? I have not debugged this, but when you are assigning
filtered_tail = cur, who is ensuring that "filtered" list is
NULL-terminated? e.g. I suspect the cur->next might still point to
something inappropriate.

10b.
I doubt all that 'filered_head/tail' logic is needed at all. Can't you
just assign directly to the head of dbinfo[i].tables list as you
encounter appropriate tables for that db? The order may become
reversed, but does that matter?

e.g. something like this
cur->next = dbinfo[i].tables ? dbinfo[i].tables.next : NULL;
dbinfo[i].tables = cur;

~~~

create_publication:

11.
PGresult *res;
char *ipubname_esc;
char *spubname_esc;
+ bool first_table = true;

This 'first_table' variable is redundant. Just check i == 0 in the loop.

~~~

12.
+ appendPQExpBuffer(str, "%s%s.%s",
+ first_table ? "" : ", ",
+ escaped_schema, escaped_table);

SUGGESTION
if (i > 0)
appendPQExpBuffer(str, ", ");

appendPQExpBuffer(str, "%s.%s", escaped_schema, escaped_table);

~~~

13.
- if (!simple_string_list_member(&opt.database_names, optarg))
{
- simple_string_list_append(&opt.database_names, optarg);
- num_dbs++;
+ if (current_dbname)
+ pg_free(current_dbname);
+ current_dbname = pg_strdup(optarg);
+
+ if (!simple_string_list_member(&opt.database_names, optarg))
+ {
+ simple_string_list_append(&opt.database_names, optarg);
+ num_dbs++;
+ }
+ else
+ pg_fatal("database \"%s\" specified more than once for
-d/--database", optarg);
+ break;
}
- else
- pg_fatal("database \"%s\" specified more than once for
-d/--database", optarg);
- break;

13a.
The scope {} is not needed. You removed the previously declared variable.

13b.
The pfree might be overkill. I think a few little database name
strdups leaks are hardly going to be a problem. It's up to you.

~~~

14.
+ char *copy_arg;
+ char *first_dot;
+ char *second_dot;
+ char *dbname_arg = NULL;
+ char *schema_table_part;
+ TableSpec *ts;

Does /dbname_arg/dbname_part/make more sense here?

~~~

15.
+ if (first_dot != NULL)
+ second_dot = strchr(first_dot + 1, '.');
+ else
+ second_dot = NULL;
+
+

The 'else' is not needed if you had assigned NULL at the declaration.

~~~

16.
The logic seems quite brittle. e.g. Maybe you should only parse
looking for dots to the end of the copy_arg or the first space.
Otherwise, other "dots" in the table spec will mess up your logic.

e.g.
"db.sch.tbl" -> 2 dots
"sch.tbl WHERE (x > 1.0)" -> also 2 dots

Similarly, what you called 'schema_table_part' cannot be allowed to
extend beyond any space; otherwise, it can easily return an unexpected
dotcnt.

"db.sch.tbl WHERE (x > 1.0)" -> 3 dots

~~~

17.
+ patternToSQLRegex(encoding, dbbuf, schemabuf, namebuf,
+ schema_table_part, false, false, &dotcnt);
+
+ if (dbname_arg != NULL)
+ dotcnt++;
+
+ if (dotcnt == 2)
+ {
+ ts->pattern_part1_regex = pg_strdup(dbbuf->data);
+ ts->pattern_part2_regex = pg_strdup(schemabuf->data);
+ ts->pattern_part3_regex = namebuf->len > 0 ? pg_strdup(namebuf->data) : NULL;
+ }
+ else if (dotcnt == 1)
+ {
+ ts->pattern_part1_regex = NULL;
+ ts->pattern_part2_regex = pg_strdup(dbbuf->data);
+ ts->pattern_part3_regex = NULL;
+ }
+ else
+ pg_fatal("invalid table specification \"%s\"", optarg);

Something seems very strange here...

I haven't debugged this, but I imagine if "--table sch.tab" then
dotcnt will be 1.

But then
+ ts->pattern_part1_regex = NULL;
+ ts->pattern_part2_regex = pg_strdup(dbbuf->data);
+ ts->pattern_part3_regex = NULL;

The schema regex is assigned the dbbuf->data (???) I don't trust this
logic. I saw that there are no test cases where dbpart is omitted, so
maybe this is a lurking bug?

~~~

18.
+
+ if (!table_list_head)
+ table_list_head = table_list_tail = ts;
+ else
+ table_list_tail->next = ts;
+
+ table_list_tail = ts;
+

All this 'tail' stuff looks potentially unnecessary. e.g. I think you
could simplify all this just by adding to the front of the list.

SUGGESTION
ts->next = table_list_head;
table_list_head = ts;

======
.../t/040_pg_createsubscriber.pl

19.
+# Declare database names
+my $db3 = 'db3';
+my $db4 = 'db4';
+

What do these variables achieve? e.g. Why not just use hardcoded
strings 'db1' and 'db2'.

~~~

20.
+# Test: Table-level publication creation
+$node_p->safe_psql($db3, "CREATE TABLE public.t1 (id int, val text)");
+$node_p->safe_psql($db3, "CREATE TABLE public.t2 (id int, val text)");
+$node_p->safe_psql($db3, "CREATE TABLE myschema.t4 (id int, val text)");

You could combine all these.

~~~

21.
+$node_p->safe_psql($db4,
+ "CREATE TABLE public.t3 (id int, val text, extra int)");
+$node_p->safe_psql($db4,
+ "CREATE TABLE otherschema.t5 (id serial primary key, info text)");

You could combine these.

~~~

22.
+# Create explicit publications
+my $pubname1 = 'pub1';
+my $pubname2 = 'pub2';
+

What does creating these variables achieve? e.g. Why not just use
hardcoded strings 'pub1' and 'pub2'.

~~~

23.
+# Run pg_createsubscriber with table-level options
+command_ok(
+ [
+ 'pg_createsubscriber',
+ '--verbose',
+ '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
+ '--pgdata' => $node_s2->data_dir,
+ '--publisher-server' => $node_p->connstr($db3),
+ '--socketdir' => $node_s2->host,
+ '--subscriber-port' => $node_s2->port,
+ '--publication' => $pubname1,
+ '--publication' => $pubname2,
+ '--database' => $db3,
+ '--table' => "$db3.public.t1",
+ '--table' => "$db3.public.t2",
+ '--table' => "$db3.myschema.t4",
+ '--database' => $db4,
+ '--table' => "$db4.public.t3",
+ '--table' => "$db4.otherschema.t5",
+ ],
+ 'pg_createsubscriber runs with table-level publication (existing nodes)');
+

There is no test for a --table spec where the dbpart is omitted.

~~~

24.
+# Check publication tables for db3 with public schema first

What is the significance "with public schema first" in this comment?

25.
+# Check publication tables for db4, with public schema first
+my $actual2 = $node_p->safe_psql(
+ $db4, qq(
+ SELECT pubname || '|' || schemaname || '|' || tablename
+ FROM pg_publication_tables
+ WHERE pubname = '$pubname2'
+ ORDER BY schemaname, tablename
+ )
+);
+is( $actual2,
+ "$pubname2|otherschema|t5\n$pubname2|public|t3",
+ 'publication includes tables in public and otherschema schemas on db4');
+

Ditto. What is the significance "with public schema first" in this comment?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Re: Add support for specifying tables in pg_createsubscriber. at 2025-08-21 09:08:33 from Shubham Khanna

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Smith	2025-08-22 07:10:32	Re: Add support for specifying tables in pg_createsubscriber.
Previous Message	Peter Eisentraut	2025-08-22 07:02:22	Re: make VALIDATE domain constraint lock on related relations as ShareUpdateExclusiveLock