From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | Shubham Khanna <khannashubham1197(at)gmail(dot)com> |
Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Add support for specifying tables in pg_createsubscriber. |
Date: | 2025-08-15 07:15:38 |
Message-ID: | CAHut+PtkKLHZp5q3e-Tpmg_TmB=uCMfAzYO6SA7b5YyodwDkCg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Shubham,
Some review comments for patch v2-0001
======
1. General - compile errors!
Patch applies OK, but I cannot build pg_createsubscriber. e.g.
pg_createsubscriber.c: In function ‘main’:
pg_createsubscriber.c:2237:5: error: a label can only be part of a
statement and a declaration is not a statement
TableListPerDB * newdb;
^
pg_createsubscriber.c:2324:5: error: a label can only be part of a
statement and a declaration is not a statement
TableSpec * ts = pg_malloc0(sizeof(TableSpec));
^
pg_createsubscriber.c:2325:5: error: expected expression before ‘PQExpBuffer’
PQExpBuffer dbbuf;
^
pg_createsubscriber.c:2326:5: warning: ISO C90 forbids mixed
declarations and code [-Wdeclaration-after-statement]
PQExpBuffer schemabuf;
^
pg_createsubscriber.c:2335:5: error: ‘dbbuf’ undeclared (first use in
this function)
dbbuf = createPQExpBuffer();
I'm not sure how this can be working for you (???). You cannot declare
variables without introducing a code block in the scope of the switch
case.
======
doc/src/sgml/ref/pg_createsubscriber.sgml
2.
--table is missing from synopsis
~~~
3.
+ <term><option>--table=<replaceable
class="parameter">table</replaceable></option></term>
+ <listitem>
+ <para>
+ Adds a table to be included in the publication for the most recently
+ specified database. Can be repeated multiple times. The syntax
+ supports optional column lists and WHERE clauses.
+ </para>
+ </listitem>
+ </varlistentry>
Lacks detail, so I can't tell how to use this. e.g:
- What does the syntax actually look like?
- code suggests you can specify DB, but this doc says it only applies
to the most recent DB
- how supports column lists and WHERE clause (needs examples)
- needs rules FOR ALL TABLES, etc.
- allowed in combination with --all?
- etc.
======
src/bin/pg_basebackup/pg_createsubscriber.c
4.
+typedef struct TableListPerDB
+{
+ char *dbname;
+ TableSpec *tables;
+ struct TableListPerDB *next;
+} TableListPerDB;
I didn't understand the need for this "per-DB" structure.
Later, you declare "TableSpec *tables;" within the "LogicalRepInfo"
structure (which is per-DB) so you already have the "per-DB" table
list right there. Even if you need to maintain some global static
list, then I imagine you could just put a 'dbname' member in
TableSpec. You don't need a whole new structure to do it.
~~~
create_publication:
5.
+ if (dbinfo->tables == NULL)
+ appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR ALL TABLES", ipubname_esc);
+ else
+ {
+ bool first = true;
+
+ appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR TABLE ", ipubname_esc);
+ for (TableSpec * tbl = dbinfo->tables; tbl != NULL; tbl = tbl->next)
What if '*' is specified for the table name? Should that cause a
"CREATE PUBLICATION ... FOR TABLES IN SCHEMA ..." instead of making a
publication with 100s or more tables in a FOR TABLES?
~~~
6.
+ for (int i = 0; i < PQntuples(tres); i++)
+ {
+ char *escaped_schema = PQescapeIdentifier(conn, PQgetvalue(tres, i, 0),
+ strlen(PQgetvalue(tres, i, 0)));
+ char *escaped_table = PQescapeIdentifier(conn, PQgetvalue(tres, i, 1),
+ strlen(PQgetvalue(tres, i, 1)));
+
+ appendPQExpBuffer(str, "%s%s.%s", first ? "" : ", ",
+ escaped_schema, escaped_table);
+
+ PQfreemem(escaped_schema);
+ PQfreemem(escaped_table);
+ first = false;
+ }
6a.
How about some other simple variables to avoid all the repeated PQgetvalue?
e.g.
char *sch = PQgetvalue(tres, i, 0);
char *tbl = PQgetvalue(tres, i, 1);
char *escaped_schema = PQescapeIdentifier(conn, sch, strlen(sch));
char *escaped_table = PQescapeIdentifier(conn, tbl, strlen(tbl));
~
6b.
Variable 'first' is redundant. Same as checking 'i == 0'.
~~~
7.
+ if (dry_run)
+ {
+ res = PQexec(conn, "BEGIN");
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
Would it be better to use if/else instead of:
if (dry_run)
if (!dry_run)
~~~
main:
8.
+ TableListPerDB * newdb;
if (!simple_string_list_member(&opt.database_names, optarg))
{
simple_string_list_append(&opt.database_names, optarg);
8a.
Compile error. Need {} scope to declare that variable!
~
8b.
This seems a strange muddle of 'd' which represents DATABASE and
TableListPerDB, which is a list of tables (not a database). I have
doubts that most of this linked list code is even necessary for the
'd' case.
~~~
9.
+ case 6:
+ TableSpec * ts = pg_malloc0(sizeof(TableSpec));
+ PQExpBuffer dbbuf;
Compile error. Need {} scope to declare that variable!
~~~
10.
+ if (dotcnt == 2)
+ {
+ ts->pattern_db_regex = NULL;
+ ts->pattern_schema_regex = pg_strdup(schemabuf->data);
+ ts->pattern_table_regex = pg_strdup(namebuf->data);
+ }
+ else if (dotcnt == 1)
+ {
+ ts->pattern_db_regex = NULL;
+ ts->pattern_schema_regex = pg_strdup(dbbuf->data);
+ ts->pattern_table_regex = pg_strdup(schemabuf->data);
+ }
+ else
+ pg_fatal("invalid --table specification: %s", optarg);
10a.
This code seems quite odd to me:
- The DB becomes the schema?
- The schema becomes the table?
Rather than fudging all the names of the --table parts, if you don't
know what they represent up-fron,t probably it is better to call them
just part1,part2,part3.
~~~
10b.
Why is pattern_db_regex always NULL? If it is always NULL why have it at all?
======
.../t/040_pg_createsubscriber.pl
11.
+# Test: Table-level publication creation
+$node_p->safe_psql($db3, "CREATE TABLE public.t1 (id int, val text)");
+$node_p->safe_psql($db3, "CREATE TABLE public.t2 (id int, val text)");
+$node_p->safe_psql($db4,
+ "CREATE TABLE public.t3 (id int, val text, extra int)");
+
IIUC, the schema name is part of the table syntax. So, you should
include test cases for different schemas.
~~~
12.
+# Run pg_createsubscriber with table-level options
+command_ok(
+ [
+ 'pg_createsubscriber',
+ '--verbose',
+ '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
+ '--pgdata' => $node_s2->data_dir,
+ '--publisher-server' => $node_p->connstr($db3),
+ '--socketdir' => $node_s2->host,
+ '--subscriber-port' => $node_s2->port,
+ '--database' => $db3,
+ '--table' => "$db3.public.t1",
+ '--table' => "$db3.public.t2",
+ '--database' => $db4,
+ '--table' => "$db4.public.t3",
+ ],
+ 'pg_createsubscriber runs with table-level publication (existing nodes)');
12a.
This is not really testing the same as what the commit message
describes. e.g. what about a test case where --table does not mention
the db explicitly, so relies on the most recent.
~
12b.
What should happen if the explicitly named DB in --table is not the
same as the most recent --database, even though it is otherwise
correct?
e.g.
'--database' => $db3,
'--table' => "$db3.public.t1",
'--database' => $db4,
'--table' => "$db3.public.t2",
'--table' => "$db4.public.t3",
I quickly tried it and AFAICT this was silently accepted and then the
test failed because it gave unexpected results. It doesn't seem good
behaviour.
~
12c.
(related to 12b/12c).
I became suspicious that the DB name part of the --table option is
completely bogus. And it is. The test still passes OK even after I
write junk in the --table database part, like below.
command_ok(
[
'pg_createsubscriber',
'--verbose',
'--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
'--pgdata' => $node_s2->data_dir,
'--publisher-server' => $node_p->connstr($db3),
'--socketdir' => $node_s2->host,
'--subscriber-port' => $node_s2->port,
'--database' => $db3,
'--table' => "$db3.public.t1",
'--table' => "REALLY???.public.t2",
'--database' => $db4,
'--table' => "$db4.public.t3",
],
'pg_createsubscriber runs with table-level publication (existing nodes)');
~~~
13.
+# Get the publication name created by pg_createsubscriber for db3
+my $pubname1 = $node_p->safe_psql(
+ $db3, qq(
+ SELECT pubname FROM pg_publication
+ WHERE pubname LIKE 'pg_createsubscriber_%'
+ ORDER BY pubname LIMIT 1
+));
+
Why don't you just name the publication explicitly in the command?
Then you don't need any of this code to discover the publication name
here.
~~~
14.
+# Check publication tables for db3
+my $actual1 = $node_p->safe_psql(
+ $db3, qq(
+ SELECT pubname || '|public|' || tablename
+ FROM pg_publication_tables
+ WHERE pubname = '$pubname1'
+ ORDER BY tablename
+));
+is($actual1, "$pubname1|public|t1\n$pubname1|public|t2",
+ 'single publication for both tables created successfully on database db3'
+);
What is the point of hardwiring the 'public' in the concatenated
string, and then verifying that it is still there in the result? Why
not hardwire 'banana' instead of 'public' -- it passes the test just
the same.
~~~
15.
+# Get the publication name created by pg_createsubscriber for db4
+my $pubname2 = $node_p->safe_psql(
+ $db4, qq(
+ SELECT pubname FROM pg_publication
+ WHERE pubname LIKE 'pg_createsubscriber_%'
+ ORDER BY pubname LIMIT 1
+));
+
(same as #13 before)
Why don't you simply name the publication explicitly in the command?
Then you don't need any of this code to discover the publication name
here.
~~~
16.
+# Check publication tables for db4
+my $actual2 = $node_p->safe_psql(
+ $db4, qq(
+ SELECT pubname || '|public|' || tablename
+ FROM pg_publication_tables
+ WHERE pubname = '$pubname2'
+ ORDER BY tablename
+));
+is($actual2, "$pubname2|public|t3",
+ 'single publication for t3 created successfully on database db4');
+
(same as #14 before)
What is the point of the hardwired 'public'?
======
Kind Regards,
Peter Smith.
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Kirill Reshke | 2025-08-15 07:36:00 | Re: Sequence Access Methods, round two |
Previous Message | Michael Paquier | 2025-08-15 07:09:09 | Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options |