RE: pg_get_publication_tables() output duplicate relid

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: pg_get_publication_tables() output duplicate relid
Date: 2021-11-16 01:50:45
Message-ID: OS0PR01MB5716D9B63334249B0BD8D79194999@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 15, 2021 6:17 PM Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> On Mon, Nov 15, 2021 at 1:48 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Hi hackers,
> >
> > In another thread[1], we found the pg_get_publication_tables function
> > will output duplicate partition relid when adding both child and
> > parent table to the publication(pubviaroot = false).
> >
> > Example:
> > create table tbl1 (a int) partition by range (a); create table
> > tbl1_part1 partition of tbl1 for values from (1) to (10); create table
> > tbl1_part2 partition of tbl1 for values from (10) to (20); create
> > publication pub for table tbl1, tbl1_part1 with
> > (publish_via_partition_root=false);
> >
> > select * from pg_get_publication_tables('pub'); relid
> > -------
> > 16387
> > 16390
> > 16387
> >
> > The reason of the duplicate output is that:
> >
> > pg_get_publication_tables invokes function GetPublicationRelations
> internally.
> > In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1'
> > twice. First time from extracting partitions from the specified parent
> > table 'tbl1', second time from the explicitly specified partition 'tbl1_part1'.
> >
> >
> > I am not sure is this behavior expected as it seems natural for
> > pg_get_publication_tables to return duplicate-free relid list. OTOH,
> > there seems no harm for the current behavior(duplicate output), it
> > doesn't affect the initial sync and change replication when using logical
> replication.
> >
> > Personally, I think it might be better to make the output of
> > pg_get_publication_tables duplicate-free, because the change happened
> > on each output relid will only be replicated once. So, it seems more
> > consistent to output each relid only once.
> >
> > Thoughts ?
> >
> > (Attach a patch which make the output duplicate-free)
> >
> > [1]
> >
> https://www.postgresql.org/message-id/CAJcOf-eURu03QNmD%3D37Ptsxu
> NW4nB
> > GN3G_FdRMBx_tpkeyzDUw%40mail.gmail.com
>
> The users can always specify the distinct clause, see [1]. I don't see any problem
> with the existing way. If required you can specify in the view
> pg_publication_tables documentation that "This view returns multiple rows for
> the same relation, if the relation is child to a parent partition table and if both
> the parent and child are specified in the publication" or something similart.

Thanks for the response.
Yes, I agreed, as I said there's no harm for the current behavior. If most people don't think
It's worth changing the behavior, I am fine with that.

> If at all, the pg_get_publication_tables() is supposed to give unique outputs,
> let's fix it and it is a good idea to specify that the view pg_publication_tables
> returns unique rows even though child and parent partition tables are specified
> in the publication.
>
> I have fdw comments on the patch:
> Let's avoid adding anything to the while
> loop and use list_sort+list_deduplicate_oid (qsort(O(nlogn)+O(n))
>

Thanks for the comments, I will address this if we finally decide to fix it.

>
> 2) Are you sure you want GetPublicationRelations to be returning the unique
> relations? I'm just thinking why can't you just do list_sort+list_deduplicate_oid
> in the caller? I think this makes the function more restrictive. If required, you
> can pass a boolean flag (bool give_unique and specify in the function
> comments and if passed true do list_sort+list_deduplicate_oid at the end of
> GetPublicationRelations ).

I change the GetPublicationRelations because I found no caller need the
duplicate oid, and de-duplicate oid in GetPublicationRelations could reduce
some code change. I think if we need the duplicate oid in the future, we can
add the flag as you suggested.

Best regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-11-16 01:51:06 RE: pg_get_publication_tables() output duplicate relid
Previous Message Tatsuro Yamada 2021-11-16 00:22:20 Re: Add psql command to list constraints