Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, jianingy(dot)yang(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table
Date: 2019-04-25 02:21:41
Message-ID: 6bc8cb73-87ca-d069-fc4b-23e684fa1d74@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 2019/04/25 8:27, Tom Lane wrote:
> Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> writes:
>> On 2019/04/24 7:03, Tom Lane wrote:
>>> ISTM we could work around the problem with the attached, which I think
>>> is a good change independently of anything else.
>
>> Agreed.
>
> After thinking about it more, it seems like a bad idea to put the check
> in CheckIndexCompatible; that could interfere with any other use of the
> function to match up potential child indexes.

CheckIndexCompatible() seems to be invented solely for the purposes of
post-ALTER-COLUMN-TYPE, which is what I get from the header comment of
that function:

* This is tailored to the needs of ALTER TABLE ALTER TYPE, which recreates
* any indexes that depended on a changing column from their pg_get_indexdef
* or pg_get_constraintdef definitions. We omit some of the sanity checks of
* DefineIndex. We assume that the old and new indexes have the same number
* of columns and that if one has an expression column or predicate, both do.
* Errors arising from the attribute list still apply.

Given that, it may have been better to keep the function local to
tablecmds.c to avoid potential misuse; I see no other callers of this
function beside TryReuseIndex(), which in turn is only used by
post-ALTER-COLUMN-TYPE processing.

> (I wonder why this test
> isn't the same as what DefineIndex uses to spot potential child indexes,
> BTW --- it uses a completely separate function CompareIndexInfo, which
> seems both wasteful and trouble waiting to happen.)

I don't think that CheckIndexCompatible's job is to spot child indexes
compatible with a given parent index; that's CompareIndexInfo's job. The
latter was invented for the partition index DDL code, complete with
provisions to do mapping between parent and child attributes before
matching if their TupleDescs are different.

CheckIndexCompatible, as mentioned above, is to do the errands of
ATPostAlterTypeParse().

/*
* CheckIndexCompatible
* Determine whether an existing index definition is compatible with a
* prospective index definition, such that the existing index storage
* could become the storage of the new index, avoiding a rebuild.

So, maybe a bigger (different?) charter than CompareIndexInfo's, or so I
think. Although, they may be able share the code.

> So I think we should test the relkind in TryReuseIndex, instead.

Which would work too. Or we could just not call TryReuseIndex() if the
the index in question is partitioned.

> I think it would be a good idea to *also* do what you suggested
> upthread and have DefineIndex clear the field when cloning an
> IndexStmt to create a child; no good could possibly come of
> passing that down when we intend to create a new index.

Note that oldNode is only set by TryReuseIndex() today. Being careful in
DefineIndex might be a good idea anyway.

> In short, I think the right short-term fix is as attached (plus
> a new regression test case based on the submitted example).

Sounds good.

> Longer term, it's clearly bad that we fail to reuse child indexes
> in this scenario; the point about mangled comments is minor by
> comparison.

Agreed.

> I'm inclined to think that what we want to do is
> *not* recurse when creating the parent index, but to initially
> make it NOT VALID, and then do ALTER ATTACH PARTITION with each
> re-used child index. This would successfully reproduce the
> previous state in case only some of the partitions have attached
> indexes, which I don't think works correctly right now.

Well, an index in the parent must be present in all partitions, so I don't
understand which case you're thinking of.

> BTW, I hadn't ever looked closely at what the index reuse code
> does, and now that I have, my heart just sinks. I think that
> logic needs to be nuked from orbit. RelationPreserveStorage was
> never meant to be abused in this way; I invite you to contemplate
> whether it's not a problem that it doesn't check the backend and
> nestLevel fields of PendingRelDelete entries before killing them.
> (In its originally-designed use for mapped rels at transaction end,
> that wasn't a problem, but I'm afraid that it may be one now.)
>
> The right way to do this IMO would be something closer to the
> heap-swap logic in cluster.c, where we exchange the relfilenodes
> of two live indexes, rather than what is happening now. Or for
> that matter, do we really need to delete the old indexes at all?

Yeah, we wouldn't need TryReuseIndex and subsequent
RelationPreserveStorage if we hadn't dropped the old indexes to begin
with. Instead, in ATPostAlterTypeParse, check if the index after ALTER
TYPE is still incompatible (CheckIndexCompatible) and if it is, don't add
a new AT_ReAddIndex command. If it's not, *then* drop the index, and
recreate the index from scratch using an IndexStmt generated from the old
index definition. I guess We can get rid of IndexStmt.oldNode too.

> None of that looks very practical for v12, let alone back-patching
> to v11, though.

Yes.

Thanks,
Amit

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Paul Mansueto Namuag 2019-04-25 06:27:14 Re: CREATE SUBSCRIPTION fails with long passwords
Previous Message Michael Paquier 2019-04-25 00:59:24 Re: BUG #15631: Generated as identity field in a temporary table with on commit drop corrupts system catalogs

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Guo 2019-04-25 02:41:31 Re: [Patch] Check file type before calling AllocateFile() for files out of pg data directory to avoid potential issues (e.g. hang).
Previous Message alex lock 2019-04-25 01:57:04 Re: Help to review the with X cursor option.