Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

From: japin <japinli(at)hotmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
Date: 2021-01-15 09:15:47
Message-ID: MEYP282MB1669B2392633CDD90D38A890B6A70@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Fri, 15 Jan 2021 at 15:49, japin <japinli(at)hotmail(dot)com> wrote:
> On Fri, 15 Jan 2021 at 14:50, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>> On Fri, Jan 15, 2021 at 11:33 AM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
>>>
>>> > On Thu, Jan 14, 2021 at 5:36 PM Li Japin <japinli(at)hotmail(dot)com> wrote
>>> > > Do we really need to access PUBLICATIONRELMAP in this patch? What if
>>> > > we just set it to false in the else condition of (if (publish &&
>>> > > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
>>> > >
>>> > > Thank for you review. I agree with you, it doesn’t need to access
>>> > > PUBLICATIONRELMAP, since We already get the publication oid in
>>> > > GetRelationPublications(relid) [1], which also access
>>> > > PUBLICATIONRELMAP. If the current pub->oid does not in pubids, the
>>> > publish is false, so we do not need publish the table.
>>> >
>>> > +1. This is enough.
>>> >
>>> > > I have another question, the data->publications is a list, when did it
>>> > has more then one items?
>>> >
>>> > IIUC, when the single table is associated with multiple publications, then
>>> > data->publications will have multiple entries. Though I have not tried,
>>> > we can try having two or three publications for the same table and verify
>>> > that.
>>> >
>>> > > 0001 - change as you suggest.
>>> > > 0002 - does not change.
>>>
>>>
>>> Hi
>>>
>>> In 0001 patch
>>>
>>> The comments says " Relation is not associated with the publication anymore "
>>> That comment was accurate when check is_relation_part_of_publication() in the last patch.
>>>
>>> But when put the code in the else branch, not only the case in the comments(Relation is dropped),
>>> Some other case such as "partitioned tables" will hit else branch too.
>>>
>>> Do you think we need fix the comment a little to make it accurate?
>>
>> Thanks, that comment needs a rework, in fact the else condition
>> also(because we may fall into else branch even when publish is true
>> and (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot) is
>> false, but our intention(to fix the bug reported here) is to have the
>> else condition only when publish is false. And also instead of just
>> relying on the publish variable which can get set even when the
>> relation is not in the publication but ancestor_published is true, we
>> can just have something like below to fix the bug reported here:
>>
>> if (publish &&
>> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))
>> {
>> entry->pubactions.pubinsert |= pub->pubactions.pubinsert;
>> entry->pubactions.pubupdate |= pub->pubactions.pubupdate;
>> entry->pubactions.pubdelete |= pub->pubactions.pubdelete;
>> entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate;
>> }
>> else if (!list_member_oid(pubids, pub->oid))
>> {
>> /*
>> * Relation is not associated with the publication anymore i.e
>> * it would have been dropped from the publication. So we don't
>> * need to publish the changes for it.
>> */
>> entry->pubactions.pubinsert = false;
>> entry->pubactions.pubupdate = false;
>> entry->pubactions.pubdelete = false;
>> entry->pubactions.pubtruncate = false;
>> }
>>
>> So this way, the fix only focuses on what we have reported here and it
>> doesn't cause any regressions may be, and the existing comment becomes
>> appropriate.
>>
>> Thoughts?
>>
>
> I'm sorry, the 0001 patch is totally wrong. If we have only one publication, it works,
> however, this is a special case. If we have multiple publication in one subscription,
> it doesn't work. Here is a usecase.
>
> Step (1)
> -- On publisher
> CREATE TABLE t1 (a int);
> CREATE TABLE t2 (a int);
> CREATE PUBLICATION mypub1 FOR TABLE t1 WITH (publish='insert');
> CREATE PUBLICATION mypub2 FOR TABLE t2;
>
> Step (2)
> -- On subscriber
> CREATE TABLE t1 (a int);
> CREATE TABLE t2 (a int);
> CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION mypub1, mypub2;
>
> Step (3)
> -- On publisher
> INSERT INTO t1 VALUES (1);
>
> Step (4)
> -- On subscriber
> SELECT * FROM t1;
>
> In step (4), we cannot get the records, because there has two publications in
> data->publications, so we will check one by one.
> The first publication is "mypub1" and entry->pubactions.pubinsert will be set
> true, however, in the second round, the publication is "mypub2", and we cannot
> find pub->oid in pubids (only oid of "mypub1"), so it will set all entry->pubactions
> to false, and nothing will be replicate to subscriber.
>
> My apologies!

As I mentioned in previous thread, if there has multiple publications in
single subscription, it might lead data loss.

When the entry got invalidated in rel_sync_cache_publication_cb(), I think we
should mark the pubactions to false, and let get_rel_sync_entry() recalculate
the pubactions.

0001 - modify as above described.
0002 - do not change.

Any thought?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Attachment Content-Type Size
v3-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behaviour.patch text/x-patch 1.5 KB
v3-0002-Invalidate-relation-map-cache-in-subscriber-sysca.patch text/x-patch 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-01-15 09:43:18 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message kuroda.hayato@fujitsu.com 2021-01-15 09:10:34 RE: ResourceOwner refactoring