Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

From: japin <japinli(at)hotmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
Date: 2021-01-13 07:09:37
Message-ID: MEYP282MB1669D9136B6A1EC71C4F7082B6A90@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Wed, 13 Jan 2021 at 13:26, Amit Kapila wrote:
> On Tue, Jan 12, 2021 at 4:59 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>>
>> On Tue, Jan 12, 2021 at 12:06 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> > > Here's my analysis:
>> > > 1) in the publisher, alter publication drop table successfully
>> > > removes(PublicationDropTables) the table from the catalogue
>> > > pg_publication_rel
>> > > 2) in the subscriber, alter subscription refresh publication
>> > > successfully removes the table from the catalogue pg_subscription_rel
>> > > (AlterSubscription_refresh->RemoveSubscriptionRel)
>> > > so far so good
>> > >
>> >
>> > Here, it should register the worker to stop on commit, and then on
>> > commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
>> > Once the apply worker is stopped, the corresponding WALSender will
>> > also be stopped. Something here is not happening as per expected
>> > behavior.
>>
>> On the subscriber, an entry for worker stop is created in AlterSubscription_refresh --> logicalrep_worker_stop_at_commit. At the end of txn, in AtEOXact_ApplyLauncher, we try to stop that worker, but it cannot be stopped because logicalrep_worker_find returns null (AtEOXact_ApplyLauncher --> logicalrep_worker_stop --> logicalrep_worker_find). The worker entry for that subscriber is having relid as 0 [1], due to which the following if condition will not be hit. The apply worker on the subscriber related to the subscription on which refresh publication was run is not closed. It looks like relid 0 is valid because it will be applicable only during the table sync phase, the comment in the LogicalRepWorker structure says that.
>>
>> And also, I think, expecting the apply worker to be closed this way doesn't make sense because the apply worker is a per-subscription base, and the subscription can have other tables too.
>>
>
> Okay, that makes sense. As responded to Li Japin, let's focus on
> figuring out why we are sending the changes from the publisher node in
> some cases and not in other cases.

After some analysis, I find that the dropped tables always replicate to subscriber.
The difference is that if we drop the table from publication and refresh
publication (on subscriber), the LogicalRepRelMapEntry in should_apply_changes_for_rel()
set state to SUBREL_STATE_UNKNOWN.

(gdb) p *rel
$2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public",
relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, atttyps = 0x5564fb017780,
replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = true,
localreloid = 16412, localrel = 0x7f78705da1b8, attrmap = 0x5564fb017800, updatable = false,
*state = 0 '\000'*, statelsn = 0}

If we insert data between drop table from publication and refresh publication, the
LogicalRepRelMapEntry state is always SUBREL_STATE_READY.

(gdb) p *rel
$2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public",
relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, atttyps = 0x5564fb017780,
replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = true,
localreloid = 16412, localrel = 0x7f78705d9d38, attrmap = 0x5564fb017800, updatable = false,
*state = 114 'r'*, statelsn = 23545672}

I will dig why the state of LogicalRepRelMapEntry doesn't change in second case.

Any suggestion is welcome!

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2021-01-13 07:18:57 Re: ResourceOwner refactoring
Previous Message Kyotaro Horiguchi 2021-01-13 07:07:05 Wrong usage of RelationNeedsWAL