Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases
Date: 2023-09-04 14:42:19
Message-ID: CAD21AoDfHpBwC6Vk8VAKsiDo0GM8vxq-pNJbgKbZcCPyfgrBCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Sep 4, 2023 at 9:38 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Wednesday, July 5, 2023 2:53 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Hi,
>
> > Thanks for reviewing the patch. Pushed.
>
> My colleague Vignesh noticed that the newly added test cases were failing in BF animal sungazer[1].

Thank you for reporting!

>
> The test failed to drop the slot which is active on publisher.
>
> --error-log--
> This failure is because pg_drop_replication_slot fails with "replication slot "test_tab2_sub" is active for PID 55771638":
> 2023-09-02 09:00:04.806 UTC [12910732:4] 026_stats.pl LOG: statement: SELECT pg_drop_replication_slot('test_tab2_sub')
> 2023-09-02 09:00:04.807 UTC [12910732:5] 026_stats.pl ERROR: replication slot "test_tab2_sub" is active for PID 55771638
> 2023-09-02 09:00:04.807 UTC [12910732:6] 026_stats.pl STATEMENT: SELECT pg_drop_replication_slot('test_tab2_sub')
> -------------
>
> I the reason is that the test DISABLEd the subscription before dropping the
> slot, while "ALTER SUBSCRIPTION DISABLE" doesn't wait for the walsender to
> release the slot, so it's possible that the walsender is still alive when calling
> pg_drop_replication_slot() to drop the slot on publisher(pg_drop_xxxslot()
> doesn't wait for slot to be released).

I agree with your analysis.

>
> I think we can wait for the slot to become inactive before dropping like:
> $node_primary->poll_query_until('otherdb',
> "SELECT NOT EXISTS (SELECT 1 FROM pg_replication_slots WHERE active_pid IS NOT NULL)"
> )
>

I prefer this approach but it would be better to specify the slot name
in the query.

> Or we can just don't drop the slot as it’s the last testcase.

Since we might add other tests after that in the future, I think it's
better to drop the replication slot (and subscription).

>
> Another thing might be worth considering is we can add one parameter in
> pg_drop_replication_slot() to let user control whether to wait or not, and the
> test can be fixed as well by passing nowait=false to the func.

While it could be useful in general we cannot use this approach for
this issue since it cannot be backpatched to older branches. We might
want to discuss it on a new thread.

I've attached a draft patch to fix this issue. Feedback is very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
fix.patch application/octet-stream 849 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2023-09-04 14:43:49 Create shorthand for including all extra tests
Previous Message Aleksander Alekseev 2023-09-04 14:41:43 Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)