From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | Shubham Khanna <khannashubham1197(at)gmail(dot)com> |
Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Add support for specifying tables in pg_createsubscriber. |
Date: | 2025-08-22 07:06:42 |
Message-ID: | CAHut+Pss+yoey-bpbYV2C+F7Cw1hAsL6Ngp7k5VhbCpVOSLbCQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Shubham.
Some review comments for patch v3-0001.
======
Commit message
1.
Allows optional column lists and row filters.
~
Is this right? Aren't the column lists and row filters now separated
in patch 0002?
======
doc/src/sgml/ref/pg_createsubscriber.sgml
2. synopsis
pg_createsubscriber [option...] { -d | --database }dbname { -D |
--pgdata }datadir { -P | --publisher-server }connstr { --table
}table-name
2a.
The --table should follow the --database instead of being last.
~
2b.
The brackets {--table} don't seem useful
~
2c.
Since there can be multiple tables per database, I feel an ellipsis
(...) is needed too
~
3.
+ <varlistentry>
+ <term><option>--table=<replaceable
class="parameter">table</replaceable></option></term>
+ <listitem>
This belongs more naturally immediately following --database.
~~~
4.
+ <para>
+ The argument must be a fully qualified table name in one of the
+ following forms:
+ <itemizedlist><listitem><para><literal>schema.table</literal></para></listitem>
+ <listitem><para><literal>db.schema.table</literal></para></listitem></itemizedlist>
+ If the database name is provided, it must match the most recent
+ <option>--database</option> argument.
+ </para>
4a.
It would be nicer if the SGML were less compacted.
e.g.
<itemizedlist>
<listitem><para><literal>schema.table</literal></para></listitem>
<listitem><para><literal>db.schema.table</literal></para></listitem>
</itemizedlist>
~
4b.
Why do we insist that the user must explicitly give a schema name?
Can't a missing schema just imply "public" so user doesn't have to
type so much?
~
4c.
FUNDAMENTAL SPEC QUESTION....
I am confused why this patch allows specifying the 'db' name part of
--table when you also insist it must be identical to the most recent
--database name. With this rule, it seems unnecessary and merely means
more validation code is required.
I can understand having the 'db' name part might be useful if you
would also permit the --table to be specified anywhere on the command,
but insisting it must match the most recent --database name seems to
cancel out any reason for allowing it in the first place.
~~~
5.
+ <para>
+ A table specification may also include an optional column list and/or
+ row filter:
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>schema.table(col1, col2, ...)</literal> — publishes
+ only the specified columns.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>schema.table WHERE (predicate)</literal> — publishes
+ only rows that satisfy the given condition.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Both forms can be combined, e.g.
+ <literal>schema.table(col1, col2) WHERE (id > 100)</literal>.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
AFAIK, support for this was separated into patch 0002. So these docs
should be in patch 0002, not here.
======
src/bin/pg_basebackup/pg_createsubscriber.c
6.
+typedef struct TableSpec
+{
+ char *spec;
+ char *dbname;
+ char *pattern_regex;
+ char *pattern_part1_regex;
+ char *pattern_part2_regex;
+ char *pattern_part3_regex;
+ struct TableSpec *next;
+} TableSpec;
+
6a.
Add a typedefs.list entry for this new typedef, and run pg_indent.
~
6b.
pattern_regex is unused?
~
6c.
pattern_part1_regex seems assigned, but it is otherwise unused (??). Needed?
~
6d.
I had previously given a review comment suggesting names like
part1/2/3 because previously, you were using members with different
meanings depending on the tablespec. But, now it seems all the dot
parse logic is re-written, so AFAICT you are always using part1 means
dbregex; part2 means schemaregex; part3 means tableregex ... So, the
"part" names are strange in the current impl - if you really do know
what they represent, then call them by their proper names.
~~~
7.
+static TableSpec * table_list_head = NULL;
+static TableSpec * table_list_tail = NULL;
+static char *current_dbname = NULL;
7a.
Why isn't 'current_dbname' just a local variable of main()?
~
7b.
I doubt the 'tail' var is needed. See a later review comment for details.
~~~
8.
-
/*
* Cleanup objects that were created by pg_createsubscriber if there is an
* error.
Whitespace change unrelated to this patch?
~~~
usage:
9.
printf(_(" --subscription=NAME subscription name\n"));
+ printf(_(" --table table to subscribe to;
can be specified multiple times\n"));
printf(_(" -V, --version output version
information, then exit\n"));
Or, should this say "table to publish". Maybe it amounts to the same
thing, but this patch modifies CREATE PUBLICATION, not CREATE
SUBSCRIPTION.
~~~
store_pub_sub_info:
10.
+ for (i = 0; i < num_dbs; i++)
+ {
+ TableSpec *prev = NULL;
+ TableSpec *cur = table_list_head;
+ TableSpec *filtered_head = NULL;
+ TableSpec *filtered_tail = NULL;
+
+ while (cur != NULL)
+ {
+ TableSpec *next = cur->next;
+
+ if (strcmp(cur->dbname, dbinfo[i].dbname) == 0)
+ {
+ if (prev)
+ prev->next = next;
+ else
+ table_list_head = next;
+
+ cur->next = NULL;
+ if (!filtered_head)
+ filtered_head = filtered_tail = cur;
+ else
+ {
+ filtered_tail->next = cur;
+ filtered_tail = cur;
+ }
+ }
+ else
+ prev = cur;
+ cur = next;
+ }
+ dbinfo[i].tables = filtered_head;
+ }
+
10a.
Is there a bug? I have not debugged this, but when you are assigning
filtered_tail = cur, who is ensuring that "filtered" list is
NULL-terminated? e.g. I suspect the cur->next might still point to
something inappropriate.
~
10b.
I doubt all that 'filered_head/tail' logic is needed at all. Can't you
just assign directly to the head of dbinfo[i].tables list as you
encounter appropriate tables for that db? The order may become
reversed, but does that matter?
e.g. something like this
cur->next = dbinfo[i].tables ? dbinfo[i].tables.next : NULL;
dbinfo[i].tables = cur;
~~~
create_publication:
11.
PGresult *res;
char *ipubname_esc;
char *spubname_esc;
+ bool first_table = true;
This 'first_table' variable is redundant. Just check i == 0 in the loop.
~~~
12.
+ appendPQExpBuffer(str, "%s%s.%s",
+ first_table ? "" : ", ",
+ escaped_schema, escaped_table);
SUGGESTION
if (i > 0)
appendPQExpBuffer(str, ", ");
appendPQExpBuffer(str, "%s.%s", escaped_schema, escaped_table);
~~~
13.
- if (!simple_string_list_member(&opt.database_names, optarg))
{
- simple_string_list_append(&opt.database_names, optarg);
- num_dbs++;
+ if (current_dbname)
+ pg_free(current_dbname);
+ current_dbname = pg_strdup(optarg);
+
+ if (!simple_string_list_member(&opt.database_names, optarg))
+ {
+ simple_string_list_append(&opt.database_names, optarg);
+ num_dbs++;
+ }
+ else
+ pg_fatal("database \"%s\" specified more than once for
-d/--database", optarg);
+ break;
}
- else
- pg_fatal("database \"%s\" specified more than once for
-d/--database", optarg);
- break;
13a.
The scope {} is not needed. You removed the previously declared variable.
~
13b.
The pfree might be overkill. I think a few little database name
strdups leaks are hardly going to be a problem. It's up to you.
~~~
14.
+ char *copy_arg;
+ char *first_dot;
+ char *second_dot;
+ char *dbname_arg = NULL;
+ char *schema_table_part;
+ TableSpec *ts;
Does /dbname_arg/dbname_part/make more sense here?
~~~
15.
+ if (first_dot != NULL)
+ second_dot = strchr(first_dot + 1, '.');
+ else
+ second_dot = NULL;
+
+
The 'else' is not needed if you had assigned NULL at the declaration.
~~~
16.
The logic seems quite brittle. e.g. Maybe you should only parse
looking for dots to the end of the copy_arg or the first space.
Otherwise, other "dots" in the table spec will mess up your logic.
e.g.
"db.sch.tbl" -> 2 dots
"sch.tbl WHERE (x > 1.0)" -> also 2 dots
Similarly, what you called 'schema_table_part' cannot be allowed to
extend beyond any space; otherwise, it can easily return an unexpected
dotcnt.
"db.sch.tbl WHERE (x > 1.0)" -> 3 dots
~~~
17.
+ patternToSQLRegex(encoding, dbbuf, schemabuf, namebuf,
+ schema_table_part, false, false, &dotcnt);
+
+ if (dbname_arg != NULL)
+ dotcnt++;
+
+ if (dotcnt == 2)
+ {
+ ts->pattern_part1_regex = pg_strdup(dbbuf->data);
+ ts->pattern_part2_regex = pg_strdup(schemabuf->data);
+ ts->pattern_part3_regex = namebuf->len > 0 ? pg_strdup(namebuf->data) : NULL;
+ }
+ else if (dotcnt == 1)
+ {
+ ts->pattern_part1_regex = NULL;
+ ts->pattern_part2_regex = pg_strdup(dbbuf->data);
+ ts->pattern_part3_regex = NULL;
+ }
+ else
+ pg_fatal("invalid table specification \"%s\"", optarg);
Something seems very strange here...
I haven't debugged this, but I imagine if "--table sch.tab" then
dotcnt will be 1.
But then
+ ts->pattern_part1_regex = NULL;
+ ts->pattern_part2_regex = pg_strdup(dbbuf->data);
+ ts->pattern_part3_regex = NULL;
The schema regex is assigned the dbbuf->data (???) I don't trust this
logic. I saw that there are no test cases where dbpart is omitted, so
maybe this is a lurking bug?
~~~
18.
+
+ if (!table_list_head)
+ table_list_head = table_list_tail = ts;
+ else
+ table_list_tail->next = ts;
+
+ table_list_tail = ts;
+
All this 'tail' stuff looks potentially unnecessary. e.g. I think you
could simplify all this just by adding to the front of the list.
SUGGESTION
ts->next = table_list_head;
table_list_head = ts;
======
.../t/040_pg_createsubscriber.pl
19.
+# Declare database names
+my $db3 = 'db3';
+my $db4 = 'db4';
+
What do these variables achieve? e.g. Why not just use hardcoded
strings 'db1' and 'db2'.
~~~
20.
+# Test: Table-level publication creation
+$node_p->safe_psql($db3, "CREATE TABLE public.t1 (id int, val text)");
+$node_p->safe_psql($db3, "CREATE TABLE public.t2 (id int, val text)");
+$node_p->safe_psql($db3, "CREATE TABLE myschema.t4 (id int, val text)");
You could combine all these.
~~~
21.
+$node_p->safe_psql($db4,
+ "CREATE TABLE public.t3 (id int, val text, extra int)");
+$node_p->safe_psql($db4,
+ "CREATE TABLE otherschema.t5 (id serial primary key, info text)");
You could combine these.
~~~
22.
+# Create explicit publications
+my $pubname1 = 'pub1';
+my $pubname2 = 'pub2';
+
What does creating these variables achieve? e.g. Why not just use
hardcoded strings 'pub1' and 'pub2'.
~~~
23.
+# Run pg_createsubscriber with table-level options
+command_ok(
+ [
+ 'pg_createsubscriber',
+ '--verbose',
+ '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
+ '--pgdata' => $node_s2->data_dir,
+ '--publisher-server' => $node_p->connstr($db3),
+ '--socketdir' => $node_s2->host,
+ '--subscriber-port' => $node_s2->port,
+ '--publication' => $pubname1,
+ '--publication' => $pubname2,
+ '--database' => $db3,
+ '--table' => "$db3.public.t1",
+ '--table' => "$db3.public.t2",
+ '--table' => "$db3.myschema.t4",
+ '--database' => $db4,
+ '--table' => "$db4.public.t3",
+ '--table' => "$db4.otherschema.t5",
+ ],
+ 'pg_createsubscriber runs with table-level publication (existing nodes)');
+
There is no test for a --table spec where the dbpart is omitted.
~~~
24.
+# Check publication tables for db3 with public schema first
What is the significance "with public schema first" in this comment?
~~
25.
+# Check publication tables for db4, with public schema first
+my $actual2 = $node_p->safe_psql(
+ $db4, qq(
+ SELECT pubname || '|' || schemaname || '|' || tablename
+ FROM pg_publication_tables
+ WHERE pubname = '$pubname2'
+ ORDER BY schemaname, tablename
+ )
+);
+is( $actual2,
+ "$pubname2|otherschema|t5\n$pubname2|public|t3",
+ 'publication includes tables in public and otherschema schemas on db4');
+
Ditto. What is the significance "with public schema first" in this comment?
======
Kind Regards,
Peter Smith.
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2025-08-22 07:10:32 | Re: Add support for specifying tables in pg_createsubscriber. |
Previous Message | Peter Eisentraut | 2025-08-22 07:02:22 | Re: make VALIDATE domain constraint lock on related relations as ShareUpdateExclusiveLock |