Re: Add support for specifying tables in pg_createsubscriber.

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add support for specifying tables in pg_createsubscriber.
Date: 2025-09-17 23:53:22
Message-ID: CAHut+PtAJDqgkeUb7x2m9CApxFbqmLpNVHwbjQH-iecBrXEJXA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Shubham,

Here are some v9 review comments.

======
doc/src/sgml/ref/pg_createsubscriber.sgml

--clean:

1.
+ <para>
+ When publications are reused, they will not be dropped during cleanup
+ operations, ensuring they remain available for other uses.
+ Only publications created by
+ <application>pg_createsubscriber</application> on the target server
+ will be cleaned up if the operation fails. Publications on the
+ publisher server are never modified or dropped by cleanup operations.
+ </para>

1a.
OK, I understand now that you want the --clean switch to behave the
same as before and remove all publications, except now it will *not*
drop any publications that are being reused.

I can imagine how that might be convenient to keep those publications
around if the subscriber ends up being promoted to a primary node.
OTOH, it seems a bit peculiar that the behaviour of the --clean option
is now depending on the other --publication option.

I wonder what others think about this feature/quirk? e.g. maybe you
need to introduce a --clean=unused_publications?

~

1b.
Anyway, even if this behaviour is what you wanted, that text
describing --clean needs rewording.

Before this paragraph, the docs still say --clean=publications:
"...causes all other publications replicated from the source server to
be dropped as well."

Which is then immediately contradicted when you wrote:
"When publications are reused, they will not be dropped"

~~~

--publication:

2.
+ <para>
+ Use <option>--dry-run</option> to see which publications will be reused
+ and which will be created before running the actual command.
+ </para>

It seemed a bit strange to say "the actual command" because IMO it's
always an actual command regardless of the options.

SUGGEST:
Use --dry-run to safely preview which publications will be reused and
which will be created.

======
src/bin/pg_basebackup/pg_createsubscriber.c

check_and_drop_publications:

3.
The function comment has not been updated with the new rules. It still
says, "Additionally, if requested, drop all pre-existing
publications."

~~~

4.
/* Drop each publication */
for (int i = 0; i < PQntuples(res); i++)
- drop_publication(conn, PQgetvalue(res, i, 0), dbinfo->dbname,
- &dbinfo->made_publication);
-
+ {
+ if (dbinfo->made_publication)
+ drop_publication(conn, PQgetvalue(res, i, 0), dbinfo->dbname,
+ &dbinfo->made_publication);
+ }
PQclear(res);
}

/*
- * In dry-run mode, we don't create publications, but we still try to drop
- * those to provide necessary information to the user.
+ * Only drop publications that were created by pg_createsubscriber during
+ * this operation. Pre-existing publications are preserved.
*/
if (!drop_all_pubs || dry_run)
- drop_publication(conn, dbinfo->pubname, dbinfo->dbname,
- &dbinfo->made_publication);
+ {
+ if (dbinfo->made_publication)
+ drop_publication(conn, dbinfo->pubname, dbinfo->dbname,
+ &dbinfo->made_publication);
+ else
+ pg_log_info("preserving existing publication \"%s\" in database \"%s\"",
+ dbinfo->pubname, dbinfo->dbname);
+ }

4a.
I always find it difficult to follow the logic of this function.
AFAICT the "dry_mode" logic is already handled within the
drop_publication(), so check_and_drop_publications() can be simplified
by refactoring it:

CURRENT
if (drop_all_pubs)
{
...
}
if (!drop_all_pub || dry_mode)
{
...
}

SUGGEST
if (drop_all_pubs)
{
...
}
else
{
...
}

~

4b.
Why is "preserving existing publication" logged in a --dry-run only?
Shouldn't you see the same logs regardless of --dry-run? That's kind
of the whole point of a dry run, right?

======
.../t/040_pg_createsubscriber.pl

5.
+my $node_s3 = PostgreSQL::Test::Cluster->new('node_s3');
+$node_s3->init_from_backup($node_p, 'backup_tablepub', has_streaming => 1);
+

Should this be done later, where it is needed, combined with the
node_s3->start/stop?

~~~

6.
+# Run pg_createsubscriber on node S3 without --dry-run and --clean option
+# to verify that the existing publications are preserved.
+command_ok(
+ [
+ 'pg_createsubscriber',
+ '--verbose', '--verbose',
+ '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
+ '--pgdata' => $node_s3->data_dir,
+ '--publisher-server' => $node_p->connstr($db1),
+ '--socketdir' => $node_s3->host,
+ '--subscriber-port' => $node_s3->port,
+ '--database' => $db1,
+ '--database' => $db2,
+ '--publication' => 'test_pub_existing',
+ '--publication' => 'test_pub_new',
+ '--clean' => 'publications',
+ ],
+ 'run pg_createsubscriber on node S3');

6a.
That comment wording "without --dry-run and --clean option" is
confusing because it sounds like "without --dry-run" and "without
--clean"

~

6b.
There are other untested combinations like --dry-run with --clean,
which would give more confidence about the logic of function
check_and_drop_publications().

Perhaps this test could have been written using --dry-run, and you
could be checking the logs for the expected "drop" or "preserving"
messages.

~~~

7.
+# Confirm publication created by pg_createsubscriber is removed
+is( $node_s3->safe_psql(
+ $db1,
+ "SELECT COUNT(*) FROM pg_publication WHERE pubname = 'test_pub_new';"
+ ),
+ '0',
+ 'publication created by pg_createsubscriber have been removed');
+

Hmm. Here you are testing the new publication was deleted, so nowhere
is testing what the earlier comment said it was doing "...verify that
the existing publications are preserved.".

======
Kind Regards,
Peter Smith
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-09-18 00:00:50 Re: Parallel heap vacuum
Previous Message Michael Paquier 2025-09-17 23:32:22 Re: PgStat_HashKey padding issue when passed by reference