Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS
Date: 2021-05-19 03:32:41
Message-ID: CA+HiwqH+WShxk3Aeh64yYra0oSM1G8ir3B1efG-y0p-v=54Kog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 19, 2021 at 12:04 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Tue, May 18, 2021 at 07:42:08PM -0400, Tom Lane wrote:
> > I count three distinct bugs that were exposed by this attempt:
> >
> > 1. In the part of 013_partition.pl that tests firing AFTER
> > triggers on partitioned tables, we have a case of continuing
> > to access a relcache entry that's already been closed.
> > (I'm not quite sure why prion's -DRELCACHE_FORCE_RELEASE
> > hasn't exposed this.) It looks to me like instead we had
> > a relcache reference leak before f3b141c48, but now, the
> > only relcache reference count on a partition child table
> > is dropped by ExecCleanupTupleRouting, which logical/worker.c
> > invokes before it fires triggers on that table. Kaboom.

Oops.

> > This might go away if worker.c weren't so creatively different
> > from the other code paths concerned with executor shutdown.
>
> The tuple routing has made the whole worker logic messier by a larger
> degree compared to when this stuff was only able to apply DMLs changes
> on the partition leaves. I know that it is not that great to be more
> creative here, but we need to make sure that AfterTriggerEndQuery() is
> moved before ExecCleanupTupleRouting(). We could either keep the
> ExecCleanupTupleRouting() calls as they are now, and call
> AfterTriggerEndQuery() in more code paths.

Yeah, that's what I thought to propose doing too. Your patch looks
enough in that regard. Thanks for writing it.

> Or we could have one
> PartitionTupleRouting and one ModifyTableState created beforehand
> in create_estate_for_relation() if applying the change on a
> partitioned table but that means manipulating more structures across
> more layers of this code.

Yeah, that seems like too much change to me too.

> Something like the attached fixes the
> problem for me, but honestly it does not help in clarifying this code
> more. I was not patient enough to wait for CLOBBER_CACHE_ALWAYS to
> initialize the nodes in the TAP tests, so I have tested that with a
> setup initialized with a non-clobber build, and reproduced the problem
> with CLOBBER_CACHE_ALWAYS builds on those same nodes.

I have checked the fix works with a CLOBBER_CACHE_ALWAYS build.

> You are right that this is not a problem of 14~. I can reproduce the
> problem on 13 as well, and we have no coverage for tuple routing with
> triggers on this branch, so this would never have been stressed in the
> buildfarm. There is a good argument to be made here in cherry-picking
> 2ecfeda3 to REL_13_STABLE.

+1

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-05-19 03:46:25 Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS
Previous Message Michael Paquier 2021-05-19 03:26:07 Re: Multiple pg_waldump --rmgr options