Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, jianingy(dot)yang(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table
Date: 2019-04-24 23:27:10
Message-ID: 7142.1556148430@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> writes:
> On 2019/04/24 7:03, Tom Lane wrote:
>> ISTM we could work around the problem with the attached, which I think
>> is a good change independently of anything else.

> Agreed.

After thinking about it more, it seems like a bad idea to put the check
in CheckIndexCompatible; that could interfere with any other use of the
function to match up potential child indexes. (I wonder why this test
isn't the same as what DefineIndex uses to spot potential child indexes,
BTW --- it uses a completely separate function CompareIndexInfo, which
seems both wasteful and trouble waiting to happen.)

So I think we should test the relkind in TryReuseIndex, instead.
I think it would be a good idea to *also* do what you suggested
upthread and have DefineIndex clear the field when cloning an
IndexStmt to create a child; no good could possibly come of
passing that down when we intend to create a new index.

In short, I think the right short-term fix is as attached (plus
a new regression test case based on the submitted example).

Longer term, it's clearly bad that we fail to reuse child indexes
in this scenario; the point about mangled comments is minor by
comparison. I'm inclined to think that what we want to do is
*not* recurse when creating the parent index, but to initially
make it NOT VALID, and then do ALTER ATTACH PARTITION with each
re-used child index. This would successfully reproduce the
previous state in case only some of the partitions have attached
indexes, which I don't think works correctly right now.

BTW, I hadn't ever looked closely at what the index reuse code
does, and now that I have, my heart just sinks. I think that
logic needs to be nuked from orbit. RelationPreserveStorage was
never meant to be abused in this way; I invite you to contemplate
whether it's not a problem that it doesn't check the backend and
nestLevel fields of PendingRelDelete entries before killing them.
(In its originally-designed use for mapped rels at transaction end,
that wasn't a problem, but I'm afraid that it may be one now.)

The right way to do this IMO would be something closer to the
heap-swap logic in cluster.c, where we exchange the relfilenodes
of two live indexes, rather than what is happening now. Or for
that matter, do we really need to delete the old indexes at all?

None of that looks very practical for v12, let alone back-patching
to v11, though.

regards, tom lane

Attachment Content-Type Size
avoid-bogus-index-reuse-HEAD.patch text/x-diff 1.8 KB
avoid-bogus-index-reuse-11.patch text/x-diff 1.8 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2019-04-25 00:59:24 Re: BUG #15631: Generated as identity field in a temporary table with on commit drop corrupts system catalogs
Previous Message raf 2019-04-24 23:13:08 Re: bug: evil autoConcat when each string is on new line

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-04-25 00:16:50 Re: Why is it OK for dbsize.c to look at relation files directly?
Previous Message Andres Freund 2019-04-24 23:09:56 Why is it OK for dbsize.c to look at relation files directly?