Re: Add support for specifying tables in pg_createsubscriber.

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add support for specifying tables in pg_createsubscriber.
Date: 2025-09-11 08:11:11
Message-ID: CAHv8RjJqDW8VbDtmkVgZ94DL+Bj9DxFpAog9ex9UHSAgkvo=_A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 11, 2025 at 8:02 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
Hi Peter,

Thank you for the review. I have addressed your comments in the latest patch.

> My first impression of this latest patch is that the new option name
> is not particularly intuitive.
>
> Below are some more review comments for patch v5-0001; any changes to
> the name will propagate all through this with different member names
> and usage, etc, so I did not repeat that same comment over and over.
>
> ======
> Commit message
>
> 1.
> Add a new '--if-not-exists' option to pg_createsubscriber, allowing users to
> reuse existing publications if they are already present, or create them if they
> do not exist.
>
> ~
>
> IMO, this option name ("if-not-exists") gives the user no clues as to
> its purpose.
>
> I suppose you wanted the user to associate this option with a CREATE
> PUBLICATION, so that they can deduce that it acts internally like a
> "CREATE PUBLICATION pub ... IF NOT EXISTS" (if there was such a
> thing), I don't see how a user is able to find any meaning at all from
> this vague name. The only way they can know what this means is to read
> all the documentation.
>
> e.g.
> - if WHAT doesn't exist?
> - if <??> not exists, then do WHAT?
>
> Alternatives like "--reuse-pub-if-exists" or "--reuse-existing-pubs"
> might be easier to understand.
>

I agree that --if-not-exists may not be very intuitive, as it doesn’t
clearly convey the intended behavior without reading the
documentation.
For this version, I have renamed the option to
"--reuse-existing-publications", which I believe makes the purpose
clearer: if the specified publication already exists, it will be
reused; otherwise, a new one will be created. This avoids ambiguity
while still keeping the semantics aligned with the original intent.

> ~~~
>
> 2.
> Key features:
> 1. New '--if-not-exists' flag changes the behavior of '--publication'.
> 2. If the publication exists, it is reused.
> 3. If it does not exist, it is created automatically.
> 4. Supports per-database specification, consistent with other options.
> 5. Avoids the complexity of option conflicts and count-matching rules.
> 6. Provides semantics consistent with SQL’s IF NOT EXISTS syntax.
>
> ~
>
> Some of those "features" seem wrong, and some seem inappropriate for
> the commit message:
>
> e.g. TBH, I doubt that "4. Supports per-database specification" can
> work as claimed here. Supposing I have multiple databases, then I
> cannot see how I can say "if-not-exists" for some databases but not
> for others.
>
> e.g. The "5. Avoids the complexity..." seems like a reason why an
> alternative design was rejected; Why does this comment belong here?
>
> e.g. The "6. Provides semantics..." seems like your internal
> implementation justification, whereas IMO this patch should be more
> focused on making any new command-line option more
> intuitive/meaningful from the point of view of the user.
>

Updated the commit message.

> ======
> doc/src/sgml/ref/pg_createsubscriber.sgml
>
> 3.
> + <para>
> + This option provides flexibility for mixed scenarios where some
> + publications may already exist while others need to be created.
> + It eliminates the need to know in advance which publications exist on
> + the publisher.
> + </para>
>
> Hmm. Is that statement ("It eliminates the need to know...") correct?
> I feel it should say the total opposite of this.
>
> For example, IMO now the user needs to be extra careful because if
> they say --publication=pub1 --if-not-exists then they have to be 100%
> sure if 'pub1' exists or does not exist, otherwise they might
> accidentally be making pub1 FOR ALL TABLES when really they expected
> they were reusing some existing publication 'pub1', or vice versa.
>
> In fact, I think the docs should go further and *recommend* that the
> user never uses this new option without firstly doing a --dry-run to
> verify they are actually getting the publications that they assume
> they are getting.
>

Fixed.

> ~~~
>
> 4.
> + <para>
> + This option follows the same semantics as SQL
> + <literal>IF NOT EXISTS</literal> clauses, providing consistent behavior
> + with other PostgreSQL tools.
> + </para>
>
> Why even say this in the docs? What other "tools" are you referring to?
>

Fixed.

> ======
> src/bin/pg_basebackup/pg_createsubscriber.c
>
> 5.
> bool made_publication; /* publication was created */
> + bool publication_existed; /* publication existed before we
> + * started */
> };
>
> This new member feels redundant to me.
>
> AFAIK, the publication will either exist already, OR it will be
> created/made by pg_createsubscriber. You don't need 2 booleans to
> represent that.
> At most, maybe all you want is another local variable in the function
> setup_publisher().
>

Fixed.

> ~~~
>
> cleanup_objects_atexit:
>
> (same comments as posted in my v4 review)
>

Fixed.

> 6a.
> - if (dbinfo->made_publication)
> + if (dbinfo->made_publication && !dbinfo->publication_existed)
>
> Same as above review comment #5. It's either made or it's not made,
> right? Why the extra boolean?
>
> ~
>
> 6b.
> - if (dbinfo->made_publication)
> + if (dbinfo->made_publication && !dbinfo->publication_existed)
>
> Ditto.
>

Fixed.

> ~~~
>
> check_publication_exists:
>
> 7.
> +/*
> + * Add function to check if publication exists.
> + */
>
> Function comment should not say "Add function".
>

Fixed.

> ~
>
> 8.
> + PGresult *res;
> + bool exists = false;
> + char *query;
>
> Redundant assignment?
>

Fixed.

> ~~~
>
> setup_publisher:
>
> 9.
> + /*
> + * Check if publication already exists when --if-not-exists is
> + * specified
> + */
> + if (opt->if_not_exists)
> + {
> + dbinfo[i].publication_existed = check_publication_exists(conn,
> dbinfo[i].pubname, dbinfo[i].dbname);
> + if (dbinfo[i].publication_existed)
> + {
> + pg_log_info("using existing publication \"%s\" in database \"%s\"",
> + dbinfo[i].pubname, dbinfo[i].dbname);
> + }
> + }
> +
> /*
> * Create publication on publisher. This step should be executed
> * *before* promoting the subscriber to avoid any transactions between
> * consistent LSN and the new publication rows (such transactions
> * wouldn't see the new publication rows resulting in an error).
> */
> - create_publication(conn, &dbinfo[i]);
> + if (opt->if_not_exists && dbinfo[i].publication_existed)
> + dbinfo[i].made_publication = false;
> + else
> + {
> + create_publication(conn, &dbinfo[i]);
> + if (opt->if_not_exists)
> + {
> + pg_log_info("created publication \"%s\" in database \"%s\"",
> + dbinfo[i].pubname, dbinfo[i].dbname);
> + }
> + }
>
> This all seems way more complicated than it needs to be. e.g. I doubt
> that you need to be checking opt->if_not_exists 3 times.
>
> Simpler logic might be more like below:
>
> bool make_pub = true;
>
> if (opt->if_not_exists && check_publication_exists(...))
> {
> pg_log_info("using existing...");
> make_pub = false;
> }
>
> ...
>
> if (make_pub)
> {
> create_publication(conn, &dbinfo[i]);
> pg_log_info("created publication...");
> }
>
> dbinfo[i].made_publication = make_pub;
>

Fixed.

> ======
> .../t/040_pg_createsubscriber.pl
>
> 10.
> +# Initialize node_s2 as a fresh standby of node_p for table-level
> +# publication test.
>
> I don't know that you should still be calling this a "table-level"
> test. Maybe it's more like an "existing publication" test now?
>

Fixed.

> ~~~
>
> 11.
> +is( $result, qq({test_pub_existing}
> +{test_pub_new}),
> + "subscriptions use the correct publications with --if-not-exists in
> $db1 and $db2"
> +);
>
> Something seems broken. I deliberately caused an error to occur to see
> what would happen, and the substitutions of $db1 and $db2 went crazy.
>
> # Failed test 'subscriptions use the correct publications with
> --if-not-exists in regression\"\
>
>
> ¬ !"\#$%&'()*+,-\\"\\\
> and regression./0123456789:;<=>?(at)ABCDEFGHIJKLMNOPQRSTUVWXYZ'
> # at t/040_pg_createsubscriber.pl line 614.
>

Fixed.

The attached patch contains the suggested changes.

Thanks and regards,
Shubham Khanna,

Attachment Content-Type Size
v6-0001-Support-existing-publications-in-pg_createsubscri.patch application/octet-stream 14.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-09-11 08:32:40 Re: Adding basic NUMA awareness
Previous Message Michael Paquier 2025-09-11 07:55:14 Re: Fix inconsistencies with code and beautify xlog structures description and fin hash_xlog.h